2018.02 intro to visual odometry

Introduction to
Visual Odometry
Brian Holt

CONTENTS
01 Introduction
02 Imaging with a camera
03 Feature detection: Finding points
04 Tracking and matching: Keeping points
05 Camera motion estimation: Epipolar Geometry
06 RANSAC: Handling noisy correspondences
07 A visual odometry pipeline

Introduction
 odos + metron = rule + measure
 Measure position relative to a
world coordinate frame
 Wheel odometry measures
rotations
 Slippage difficult to account for
 Direction not known
 Visual odometry useful in many
environments
 Opportunity lander
 Mars Rover
 Autonomous vehicles

SfM, SLAM, VO
SfM V-SLAM
Visual
Odometry
SLAM = VO + Loop Closure + BA

Roadmap
Image courtesy: D. Scarramuzza

Why use a camera?
Why use a camera?
 Vast information
 Extremely low Size, Weight, and Power
(SWaP) footprint
 Cheap and easy to use
 Passive sensor
 Processing power is OK today
It’s what nature uses too!
Slide courtesy: S. Weiss
Cellphone type camera, up to
16Mp (480MB/s @ 30Hz)
Cellphone processor unit
1.7GHz quadcore ARM <10g

The Camera is a Bearing Sensor
 Projective sensor which measures the bearing of a
point with respect to the optical axis
 Depth can be inferred by re-observing a point from
different angles
 The movement (i.e. the angle between the observations)
is the point's parallax
 A point at infinity is a feature which exhibits no
parallax during camera motion
 The distance of a star cannot be inferred by moving a
few kilometers
 BUT: it is a perfect bearing reference for attitude
estimation: NASA's star tracker sensors better than 1 arc
second or 0.00027deg

Image Formation
Let’s design a camera
 Idea: Put a piece of film in front of an object
 Do we get a reasonable image?
Slide courtesy: C. Stachniss, S. Seitz

Pinhole Camera
Let’s design a camera
 Add a barrier to block off most of the rays
 This reduces blurring
 The opening is known as the aperture
 How does this transform the image?
Slide courtesy: C. Stachniss, S. Seitz

Pinhole Camera
 Pinhole camera is a simple model to approximate the
imaging process
 If we treat pinhole as a point, only 1 ray from any point
can enter the camera
Slide courtesy: C. Stachniss; Image courtesy: Forsyth and Ponce
Virtual
image
pinhole
Image
plane

Camera Obscura (1544)
Slide courtesy: C. Stachniss; Image courtesy http://www.acmi.net.au
"Reinerus Gemma-Frisius, observed an eclipse of the sun at Louvain on January 24, 1544, and later he
used this illustration of the event in his book De Radio Astronomica et Geometrica, 1545. It is thought to
be the first published illustration of a camera obscura..."
Hammond, John H., The Camera Obscura, A Chronicle
In Latin, means
“dark room”

Pinhole Camera Model
 Similarity of gray triangles
 Image scale
 Mapping
Slide courtesy: C. Stachniss; Image courtesy Foerstner

Pinhole Camera Model
 Small hole: sharp image but requires large exposure times
 Large hole: short exposure times but blurry images
 Solution: replace pinhole with lenses
Slide courtesy: C. Stachniss; Image courtesy Foerstner

Lens Approximates the Pinhole
 A lens is only an approximation of the pinhole camera model
 The corresponding point on the object and in the image and the
centre of the lens should lie on one line
 The further away a ray passes the centre of the lens, the larger the
error
 Use of an aperture to limit the error (trade off between the usable
light and price of the lens)
Slide courtesy: C. Stachniss

Three Assumptions Made in the
Pinhole/Thin Lens Model
1. All rays from the object intersect in a single point
2. All image points lie on a plane
3. The ray from the object point to the image point is a straight line
Often these assumptions do not hold and lead to imperfect
images
Slide courtesy: C. Stachniss; Images courtesy: Wikipedia
Chromatic aberration Coma Astigmatism

Distortion is another common flaw
Slide courtesy: C. Stachniss; Image courtesy: Wikipedia
Deviation from rectilinear projection,
a projection in which straight lines in
a scene remain straight in an image
barrel
distortion
pincushion
distortion
mustache
distortion

Perspective Effects
Objects that are closer appear as larger

Vanishing Points
Parallel lines converge in the image at vanishing points

Perspective Projection
Slide courtesy: D. Scaramuzza
Convert point in world coordinates
to point in camera coordinates
Convert point in camera
coordinates to image plane
Convert point in the image plane
to pixel coordinates

3D Point to 2D Pixel
1. Point projects to
2. From similar triangles:
3. Same is true for y
Convert point in camera
coordinates to image plane

2D Plane to 2D Pixel
1. Account for pixel coordinates of
optical centre
2. Account for scale factors

Image courtesy: D. Scaramuzza
1. Use homogeneous coordinates to
map from 3D to 2D
2. 𝜆 is the scale

1. K is a matrix containing focal
lengths and image origin
2. Often called the intrinsic or
calibration matrix

3D Point to 3D Point

Perspective Projection

Local Image Features
 How would you align these images?

 Detect point features in both images

 Find corresponding pairs

 Find corresponding pairs
 Use these pairs to align images

Matching with Features
 Problem 1: Detect the same points independently in both images
No chance of a match!
A repeatable feature detector is required.

Matching with Features
 Problem 2: For each point, find the corresponding point in the
other image
A reliable and distinctive feature descriptor is required that is
invariant to geometric and illumination changes

What is a Distinctive Feature?
 Notice how some patches can be matched with higher accuracy
 Patches with detail are good

Point Features: Corners vs Blobs

Finding and Tracking Points
Image courtesy: OpenCV
Import numpy as np
Import cv2
…
# Take first frame and find corners in it
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
p0 = cv2.goodFeaturesToTrack(old_gray, mask =None, **feature_params)
…
while(1):
ret,frame = cap.read()
frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# calculate optical flow
p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0,None, **lk_params)
 Lukas-Kanade Tracker uses optical flow
 Assumes pixel intensities of an object do not change between frames
 Assumes neighbouring pixels have similar motion

Finding and Matching Points
Image courtesy: OpenCV
Import numpy as np
Import cv2
…
# Initiate ORB detector
orb = cv.ORB_create()
# find the keypoints and descriptors with ORB
kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)
…
# create BFMatcher object
bf =cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True)
# Match descriptors.
matches = bf.match(des1,des2)
# Sort them in the order of their distance.
matches = sorted(matches, key =lambdax:x.distance)
# Draw first 10 matches.
img3 =cv.drawMatches(img1,kp1,img2,kp2,matches[:10], flags=2)
 OpenCV support BruteForce and FLANN (approximate NN)

2 View Geometry
Image courtesy: Auckland University
 Goal: to estimate 3D scene structure, camera poses (up to scale
factor) and camera intrinsics
2 Cases:
 Calibrated cameras:
K matrices are known
 Uncalibrated cams:
K matrices unknown
𝑅, 𝑇

2 View Geometry
Image courtesy: Auckland University
 Goal: to estimate 3D scene structure, scale and camera intrinsics
 Find 𝑅, 𝑇, 𝑃 𝑖 that satisfy
𝑅, 𝑇

Scale Ambiguity in Monocular Vision
Image courtesy: Amsterdam City Tours
 Rescaling the scene by constant factor results in same image
 We cannot recover the scale!
 Only 5 degrees of freedom: 3 for rotation, 2 for direction of
translation (but no scale)

Calibrated Cameras
 Use normalised image coordinates
 Find 𝑅, 𝑇, 𝑃 𝑖 that satisfy

Essential Matrix
 𝑝𝑙, 𝑝 𝑟, 𝑇 form a
plane (coplanar)

Fundamental Matrix
 𝑝𝑙, 𝑝 𝑟, 𝑇 form a
plane (coplanar)

Recovering Pose
 Recover 𝑅, 𝑇 from
the essential matrix
 Only one solution
where point is in
front of both
cameras
 Apply motion
constancy (Kalman
Filter?)

RANSAC
Assume:
 The model parameters can be estimated from N data items (e.g.
essential matrix from 5-8 points)
 There are M data items in total.
The algorithm:
1. Select N data items at random
2. Estimate parameters (linear or nonlinear least square, or other)
3. Find how many data items (of M) fit the model with parameter
vector within a user given tolerance, T. Call this k. if K is the
largest (best fit) so far, accept it.
Repeat 1. to 4. S times

Fundamental Matrix Song
https://www.youtube.com/watch?v=DgGV3l82NTk

Putting It All Together
Import numpy as np
Import cv2
…
while(1):
…
# calculate the essential matrix from matched/ tracked points
E=cv2.findEssentialMat(points2,points1,focal,pp,RANSAC,0.999,1.0,mask)
# Recover R and t from the essential matrix
R, t = cv2.recoverPose(E,points2,points1,focal,pp,mask)
Image courtesy: A. Singh

Putting It All Together
Image courtesy: A. Singh
https://www.youtube.com/watch?v=homos4vd_Zs

Resources
https://avisingh599.github.io/vision/monocular-vo/
http://frc.ri.cmu.edu/~kaess/vslam_cvpr14/media/VSLAM-Tutorial-
CVPR14-A11-VisualOdometry.pdf
Davide Scaramuzza’s home page: http://rpg.ifi.uzh.ch

2018.02 intro to visual odometry

More Related Content

What's hot (20)

Similar to 2018.02 intro to visual odometry (20)

Recently uploaded (20)

2018.02 intro to visual odometry

Editor's Notes