Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)

1
Fast Multi-frame Stereo Scene Flow
with Motion Segmentation
Tatsunori Taniai*
RIKEN AIP
Sudipta N. Sinha
Microsoft Research
Yoichi Sato
The University of Tokyo
CVPR 2017 Paper
* Work done during internship at Microsoft Research and partly at the University of Tokyo.

3
Contributions
• New unified framework
– Stereo (depth / disparity)
– Optical flow (2D motion field)
– Motion segmentation (binary mask of moving objects)
– Visual odometry (6 DoF camera ego-motion)
In our framework
• Result of each task benefits others, leading to higher accuracy and efficiency
• Joint task is decomposed into simple optimization problems (in contrast to
existing joint methods)
Results
• Accurate: achieved 3rd rank on KITTI benchmark
• Fast: 10~1000x faster than state-of-the-art methods

5
Scene Flow: Problem Definition
𝑿 𝑡 = 𝑥 𝑡, 𝑦𝑡, 𝑧𝑡
𝑝𝑝′
𝐼𝑡
0
𝐼𝑡
1
Stereo disparity
1D horizontal translation
by object depth 𝑧
𝑝′

6
𝐼𝑡
0
𝐼𝑡+1
0
𝐼𝑡
1
𝐼𝑡+1
1
𝑿 𝑡
𝑝
𝑝′
Optical flow
2D translation
by camera and
object motions
𝑝′
𝑿 𝑡+1

7
𝑿 𝑡
Stereo disparity
1D horizontal translation
by object depth 𝑧
𝐼𝑡
0
𝐼𝑡+1
0
𝑿 𝑡+1
Optical flow
2D translation
by camera and
object motions
All together implicitly
represent 3D
motions of points

8
Applications
Autonomous driving
[Menze+ CVPR 15]
Action recognition
[Wang+ CVPR 11]
Depth and flow map sequences are useful in many applications
But optical flow estimation is VERY SLOW.

9
Overview
• Introduction
• Motivation
• Proposed method
• Experiments

10
Optical Flow vs Stereo
Optical flow Stereo matching
 1D translation 2D translationSearch space
Motion factor  Object motion,
Ego-motion, etc.
 Object depth
Optical flow is much more difficult & expensive than stereo

11
Dominant Rigid Scene Assumption
Most of the points are static.
Their flows are due to camera motions.

12
Flow Estimation by Depth and Camera Motion
𝐼𝑡
Rigid flow map Ground truth flow map Error map of rigid flow
Surface 𝐷𝑡
𝐼𝑡+1
Surface 𝐷𝑡+1
𝐼𝑡 𝐼𝑡+1
Given rigid flow map,
we only need to
recompute flow for
moving objects.

13
Overview
• Introduction
• Motivation
• Proposed method
• Experiments

14
Proposed Approach
Visual
odometry
Initial motion
segmentation
Optical
flow
𝐼𝑡
0
, 𝐼𝑡+1
0
Frig
Rigid flow S
Init. seg.
Epipolar
stereo
𝐼𝑡±1
0,1
, 𝐼𝑡
0
, 𝐼𝑡
1
Flow fusion
Fnon
Non-rigid flow
𝐼𝑡
0
, 𝐼𝑡+1
0
+ D + 𝐏, D+ 𝐏
𝐏
Ego-motion
D
Disparity
+ S
Binocular
stereo
𝐼𝑡
0
, 𝐼𝑡
1
D
Init. disparity
𝐼𝑡±1
0,1
, 𝐼𝑡
0
, 𝐼𝑡
1 𝐼𝑡
0
, 𝐼𝑡+1
0
+Frig, Fnon
F
Flow
S
Motion seg.
Input

15
Optimization Strategy
𝐸 𝚯 =
𝑝
𝐼𝑡
0
𝑝 − 𝐼𝑡+1
0
𝑤(𝑝; 𝚯)
Minimize image residuals

16
Optimization Strategy
𝐸 D, 𝐏, S, Fnon =
𝑝
𝐼𝑡
0
𝑝 − 𝐼𝑡+1
0
𝑤 𝑝; D, 𝐏, S, Fnon
𝑤 𝑝; D, 𝐏𝑤 𝑝 𝑤 𝑝; D, 𝐏, S, F 𝑛𝑜𝑛
Minimize image residuals
by gradually increasing complexity of the warping model.
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Rigid warping Partially non-rigid warping

17
Intermediate Step: Binocular Stereo
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
D
Init. disparity

18
Intermediate Step: Binocular Stereo
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Initial disparity map Left-right occlusion map Uncertainty map
• SGM stereo
• NCC-based matching costs
• Left-right consistency check • Using [Drory 2014]
(no computational overhead)

19
Intermediate Step: Visual Odometry
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
𝐏
Ego-motion
Binocular
stereo

20
Intermediate Step: Visual Odometry
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
min
𝐏
𝐸 𝐏|D =
𝑝
𝑤 𝑝 𝜌 𝐼𝑡 𝑝 − 𝐼𝑡+1 𝑤 𝑝; D, 𝐏
Use [Alismail+ CMU-TR14]
• Estimate the 6DoF camera motion by directly minimizing image residuals
• Iteratively reweighted least squares (Lucas-Kanade + inverse compositional)
+ Down-weight moving object regions predicted by flow F 𝑡−1 and mask S 𝑡−1
𝜌: robust penalty function
Rigid warping

21
Intermediate Step: Epipolar Stereo
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
D
Disparity
Binocular
stereo

22
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Left-right 𝐼𝑡
0
, 𝐼𝑡
1
matching is unreliable by occlusion
𝐼𝑡
0
𝐼𝑡
1
𝐼𝑡+1
0
𝐼𝑡+1
1
𝐼𝑡−1
0
𝐼𝑡−1
1
• Blend matching costs with four adjacent frames
(using estimated poses 𝐏𝑡, 𝐏𝑡−1)
• High uncertainty → high weights on adjacent
frame matching
Occlusion map
Uncertainty map

23
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Final disparity mapInitial disparity map
• Run SGM stereo again using blended matching costs
• Disparities are improved at occluded regions
Ground truth

24
Intermediate Step: Initial Motion Segmentation
Visual
odometry
Initial motion
segmentation
Optical
flow
Frig
Rigid flow
S
Epipolar
stereo
Flow fusion
Binocular
stereo
Initial segmentation

25
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
• Predict moving-object regions where rigid flow proposal is inaccurate
Ground truthRigid flow proposal Initial segmentation

26
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
• Predict moving-object regions where rigid flow proposal is inaccurate
• Use image redisuals as soft seeds in GragCut-based segmentation
Image residualRigid flow proposal Initial segmentation

27
Intermediate Step: Optical Flow
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Fnon
Non-rigid flow
Binocular
stereo

28
Intermediate Step: Optical Flow
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Initial segmentation Non-rigid flow proposal
• Estimate non-rigid flow for predicted moving-object regions
• Extend the SGM algorithm to optical flow

29
Intermediate Step: Flow Fusion
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
F
Flow
S
Motion seg.

30
Intermediate Step: Flow Fusion
Visual
odometry
Initial motion
segmentation
Optical
flow
Epipolar
stereo
Flow fusion
Binocular
stereo
Final flow map
Final motion segmentation
Rigid flow proposal
Non-rigid flow proposal
Fusion
Binary labeling
white: non-rigid
black: rigid

31
Overview
• Introduction
• Motivation
• Proposed method
• Experiments

32
KITTI 2015 Scene Flow Benchmark
Our method is ranked 3rd (November 2016)
200 road scenes with multiple moving objects

35
Summary of This Research
• New unified framework
– Stereo (depth / disparity)
– Optical flow (2D motion field)
– Motion segmentation (binary mask of moving objects)
– Visual odometry (6 DoF camera ego-motion)
• Accurate: achieved 3rd rank on KITTI benchmark
• Fast: 10 - 1000x faster than state-of-the-art methods

36
RIKEN AIP is a wonderful place
for young researchers and students.
Contact me for internship opportunities
Take home message
日本橋オフィス
DGX-1 x 24

Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)

More Related Content

What's hot (20)

Similar to Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017) (20)

Recently uploaded (20)

Fast Multi-frame Stereo Scene Flow with Motion Segmentation (CVPR 2017)