PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos

MegaDepth: Learning Single-View Depth
Prediction from Internet Photos
CVPR2018
Zhengqi Li Noah Snavely
인공지능연구원
이광희

2
 Limitations in the available training data
• NYU: Indoor-only images
• Make3D: Small numbers of training examples
• KITTI: Sparse sampling
• Difficult to collect (RGB image, depth map)
• RGB-D sensors (Kinect): limited to indoor use
• LIDAR: sparse depth maps
 Contributions : MegaDepth
• Multi-view internet photo collections (a virtually unlimited data source)
• Generate training data via modern structure-from-motion (SfM) and multi-view stereo (MVS)
• Challenges: noise and unreconstructable objects
• New data cleaning methods & automatically augmenting data with ordinal depth relations generated using
semantic segmentation.
• High accuracy & Generalizability
Motivation

3
 Download Internet photos from Flickr (Landmarks10K dataset)
 COLMAP : SOTA SfM system and MVS system
 Camera poses, sparse point clouds, dense depth maps
The MegaDepth Dataset

4
 Raw depth maps from COLMAP
• Transient objects(people, cars, etc.)
• Noisy depth discontinuities
• Bleeding of background depths into foreground objects
 Modified COLMAP
• At each iteration, keep smaller(closer) of the two at each pixel.
• Apply a median filter to remove unstable depth values
Depth map refinement

5
 Depth enhancement via semantic segmentation
• Semantic filtering (transient objects & difficult-to-reconstruct objects)
- PSPNet (150 semantic categories)
- divide the pixels into three subsets (Foreground/Background/Sky)
- If<50% of pixels in C(of F) have a reconstructed depth, discard all depths from C.
• Euclidean vs. ordinal depth (filtering training data)
- if >30% of an image I consists of valid depth values, then keep that image as training data
for learning Euclidean depth
• Automatic ordinal depth labeling
- Ford: if the area of C(of F) is larger than 5% of the image
- Bord : if p’s C(of B) is larger than 5% of the image
& p has a valid depth value that lies in the last quartile
of the full range of depths for I
Depth map refinement
F: statues, fountains, people, cars
B: building, towers, mountains, etc.

6
 200 3D models from landmarks around the world
 150K reconstructed images
 After filtering: 130K valid images
 Euclidean depth data: 100K images
 Ordinal depth data: 30K images
 Additional dataset: images from [18]
Creating a dataset
MegaDepth(MD)
[18] Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction (siggraph2017)

7
 Network architecture
• VGG
• ResNet
• Hourglass network
Depth estimation network
hourglass network

8
 Unknown scale factor: cannot compare predicted and ground truth depths
directly
 The ratio of pairs of depths are preserved under scaling.
 In the log-depth domain, the difference between pairs of log-depth
Scale-invariant Loss function
Scale-invariant data term
Muti-scale scale-invariant
gradient matching term
Robust ordinal depth loss

9
 Scale-invariant data term
Loss Function
Predicted log-depth map
GT log-depth map

10
 Multi-scale scale-invariant gradient matching term
Loss Function

11
 Robust ordinal depth loss
Loss Function
i ∈ Ford , j ∈ Bord

12
 Generalization
• to new Internet photos from never-before-seen locations
• to other types of images from other dataset
 The effect of terms in our loss function
 Experimental Setup
• Test set: 46/200 (reconstructed models)
• Training set / validation set: Randomly split 96%:4% (for 154 models)
Evaluation

13
 Error metircs
• Si-RMSE:
• SfM Disagreement Rate(SDR):
Evaluation and ablation study on MD test set

14
 Effect of loss network and loss variants

15

16
 Raw MD vs Clean MD

17
Generalization to other datasets

18

19

20
 Present a new use for Internet-derived SfM+MVS data
 Generating large amounts of training data for single view depth prediction
 generalizes very well to other datasets.
 Limitations:
• oblique surfaces (e.g., ground), thin or complex objects (e.g., lampposts), and difficult materials (e.g.,
shiny glass)
• not predict metric depth
Conclusion

PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos

More Related Content

Similar to PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos (20)

More from 광희 이 (7)

Recently uploaded (20)

PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos