SlideShare a Scribd company logo
2
Most read
6
Most read
9
Most read
MegaDepth: Learning Single-View Depth
Prediction from Internet Photos
CVPR2018
Zhengqi Li Noah Snavely
인공지능연구원
이광희
2
 Limitations in the available training data
• NYU: Indoor-only images
• Make3D: Small numbers of training examples
• KITTI: Sparse sampling
• Difficult to collect (RGB image, depth map)
• RGB-D sensors (Kinect): limited to indoor use
• LIDAR: sparse depth maps
 Contributions : MegaDepth
• Multi-view internet photo collections (a virtually unlimited data source)
• Generate training data via modern structure-from-motion (SfM) and multi-view stereo (MVS)
• Challenges: noise and unreconstructable objects
• New data cleaning methods & automatically augmenting data with ordinal depth relations generated using
semantic segmentation.
• High accuracy & Generalizability
Motivation
3
 Download Internet photos from Flickr (Landmarks10K dataset)
 COLMAP : SOTA SfM system and MVS system
 Camera poses, sparse point clouds, dense depth maps
The MegaDepth Dataset
4
 Raw depth maps from COLMAP
• Transient objects(people, cars, etc.)
• Noisy depth discontinuities
• Bleeding of background depths into foreground objects
 Modified COLMAP
• At each iteration, keep smaller(closer) of the two at each pixel.
• Apply a median filter to remove unstable depth values
Depth map refinement
5
 Depth enhancement via semantic segmentation
• Semantic filtering (transient objects & difficult-to-reconstruct objects)
- PSPNet (150 semantic categories)
- divide the pixels into three subsets (Foreground/Background/Sky)
- If<50% of pixels in C(of F) have a reconstructed depth, discard all depths from C.
• Euclidean vs. ordinal depth (filtering training data)
- if >30% of an image I consists of valid depth values, then keep that image as training data
for learning Euclidean depth
• Automatic ordinal depth labeling
- Ford: if the area of C(of F) is larger than 5% of the image
- Bord : if p’s C(of B) is larger than 5% of the image
& p has a valid depth value that lies in the last quartile
of the full range of depths for I
Depth map refinement
F: statues, fountains, people, cars
B: building, towers, mountains, etc.
6
 200 3D models from landmarks around the world
 150K reconstructed images
 After filtering: 130K valid images
 Euclidean depth data: 100K images
 Ordinal depth data: 30K images
 Additional dataset: images from [18]
Creating a dataset
MegaDepth(MD)
[18] Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction (siggraph2017)
7
 Network architecture
• VGG
• ResNet
• Hourglass network
Depth estimation network
hourglass network
8
 Unknown scale factor: cannot compare predicted and ground truth depths
directly
 The ratio of pairs of depths are preserved under scaling.
 In the log-depth domain, the difference between pairs of log-depth
Scale-invariant Loss function
Scale-invariant data term
Muti-scale scale-invariant
gradient matching term
Robust ordinal depth loss
9
 Scale-invariant data term
Loss Function
Predicted log-depth map
GT log-depth map
10
 Multi-scale scale-invariant gradient matching term
Loss Function
11
 Robust ordinal depth loss
Loss Function
i ∈ Ford , j ∈ Bord
12
 Generalization
• to new Internet photos from never-before-seen locations
• to other types of images from other dataset
 The effect of terms in our loss function
 Experimental Setup
• Test set: 46/200 (reconstructed models)
• Training set / validation set: Randomly split 96%:4% (for 154 models)
Evaluation
13
 Error metircs
• Si-RMSE:
• SfM Disagreement Rate(SDR):
Evaluation and ablation study on MD test set
14
 Effect of loss network and loss variants
Evaluation and ablation study on MD test set
15
Evaluation and ablation study on MD test set
16
 Raw MD vs Clean MD
Evaluation and ablation study on MD test set
17
Generalization to other datasets
18
Generalization to other datasets
19
Generalization to other datasets
20
 Present a new use for Internet-derived SfM+MVS data
 Generating large amounts of training data for single view depth prediction
 generalizes very well to other datasets.
 Limitations:
• oblique surfaces (e.g., ground), thin or complex objects (e.g., lampposts), and difficult materials (e.g.,
shiny glass)
• not predict metric depth
Conclusion

More Related Content

PDF
Full Waveform Inversion: Introdução e Aplicações [4/5]
PPT
סוגי שוטים וזויות צילום
DOCX
Goodfellas film poster analysis
 
PPTX
MAY 2024 ONCOLOGY CARTOONS BY DR KANHU CHARAN PATRO
PPTX
CT based Image Guided Radiotherapy - Physics & QA
PPTX
Conventions of a horror film
PPTX
Radiation for Glioblastoma
DOCX
Prova inglês i unid - 2º ano 2016
Full Waveform Inversion: Introdução e Aplicações [4/5]
סוגי שוטים וזויות צילום
Goodfellas film poster analysis
 
MAY 2024 ONCOLOGY CARTOONS BY DR KANHU CHARAN PATRO
CT based Image Guided Radiotherapy - Physics & QA
Conventions of a horror film
Radiation for Glioblastoma
Prova inglês i unid - 2º ano 2016

Similar to PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos (20)

PDF
A Beginner's Guide to Monocular Depth Estimation
PDF
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
PDF
An Enhancement for the Consistent Depth Estimation of Monocular Videos using ...
PDF
Depth Fusion from RGB and Depth Sensors III
PDF
cvpr10-depth
PDF
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
PDF
Depth Fusion from RGB and Depth Sensors IV
PDF
AR/SLAM for end-users
PDF
Depth Fusion from RGB and Depth Sensors by Deep Learning
PPTX
Single Image Depth Estimation using frequency domain analysis and Deep learning
PDF
Deep learning for 3-D Scene Reconstruction and Modeling
PPTX
Deep convolutional neural fields for depth estimation from a single image
PDF
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
PPTX
Semantic Mapping of Road Scenes
PDF
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PDF
U_N.o.1T: A U-Net exploration, in Depth
PDF
A Review On Single Image Depth Prediction with Wavelet Decomposition
PPTX
Review of MVSNet(2018)_250110_OJung.pptx
PDF
Keynote at Tracking Workshop during ISMAR 2014
PDF
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
A Beginner's Guide to Monocular Depth Estimation
AN ENHANCEMENT FOR THE CONSISTENT DEPTH ESTIMATION OF MONOCULAR VIDEOS USING ...
An Enhancement for the Consistent Depth Estimation of Monocular Videos using ...
Depth Fusion from RGB and Depth Sensors III
cvpr10-depth
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
Depth Fusion from RGB and Depth Sensors IV
AR/SLAM for end-users
Depth Fusion from RGB and Depth Sensors by Deep Learning
Single Image Depth Estimation using frequency domain analysis and Deep learning
Deep learning for 3-D Scene Reconstruction and Modeling
Deep convolutional neural fields for depth estimation from a single image
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Semantic Mapping of Road Scenes
Dataset creation for Deep Learning-based Geometric Computer Vision problems
U_N.o.1T: A U-Net exploration, in Depth
A Review On Single Image Depth Prediction with Wavelet Decomposition
Review of MVSNet(2018)_250110_OJung.pptx
Keynote at Tracking Workshop during ISMAR 2014
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
Ad

More from 광희 이 (7)

PDF
LFI-CAM: Learning Feature Importance for Better Visual Explanation
PDF
Unsupervised image to-image translation via pre-trained style gan2 network
PDF
보다 유연한 이미지 변환을 하려면?
PPTX
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PDF
PR-073 : Generative Semantic Manipulation with Contrasting GAN
PDF
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
PDF
PR12-CAM
LFI-CAM: Learning Feature Importance for Better Visual Explanation
Unsupervised image to-image translation via pre-trained style gan2 network
보다 유연한 이미지 변환을 하려면?
PR100: SeedNet: Automatic Seed Generation with Deep Reinforcement Learning fo...
PR-073 : Generative Semantic Manipulation with Contrasting GAN
PR-065 : High-Resolution Image Synthesis and Semantic Manipulation with Condi...
PR12-CAM
Ad

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Electronic commerce courselecture one. Pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation theory and applications.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
Programs and apps: productivity, graphics, security and other tools
Spectral efficient network and resource selection model in 5G networks
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?
A comparative analysis of optical character recognition models for extracting...
Electronic commerce courselecture one. Pdf
Tartificialntelligence_presentation.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectroscopy.pptx food analysis technology
Encapsulation theory and applications.pdf
A Presentation on Artificial Intelligence
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Network Security Unit 5.pdf for BCA BBA.
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology

PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos

  • 1. MegaDepth: Learning Single-View Depth Prediction from Internet Photos CVPR2018 Zhengqi Li Noah Snavely 인공지능연구원 이광희
  • 2. 2  Limitations in the available training data • NYU: Indoor-only images • Make3D: Small numbers of training examples • KITTI: Sparse sampling • Difficult to collect (RGB image, depth map) • RGB-D sensors (Kinect): limited to indoor use • LIDAR: sparse depth maps  Contributions : MegaDepth • Multi-view internet photo collections (a virtually unlimited data source) • Generate training data via modern structure-from-motion (SfM) and multi-view stereo (MVS) • Challenges: noise and unreconstructable objects • New data cleaning methods & automatically augmenting data with ordinal depth relations generated using semantic segmentation. • High accuracy & Generalizability Motivation
  • 3. 3  Download Internet photos from Flickr (Landmarks10K dataset)  COLMAP : SOTA SfM system and MVS system  Camera poses, sparse point clouds, dense depth maps The MegaDepth Dataset
  • 4. 4  Raw depth maps from COLMAP • Transient objects(people, cars, etc.) • Noisy depth discontinuities • Bleeding of background depths into foreground objects  Modified COLMAP • At each iteration, keep smaller(closer) of the two at each pixel. • Apply a median filter to remove unstable depth values Depth map refinement
  • 5. 5  Depth enhancement via semantic segmentation • Semantic filtering (transient objects & difficult-to-reconstruct objects) - PSPNet (150 semantic categories) - divide the pixels into three subsets (Foreground/Background/Sky) - If<50% of pixels in C(of F) have a reconstructed depth, discard all depths from C. • Euclidean vs. ordinal depth (filtering training data) - if >30% of an image I consists of valid depth values, then keep that image as training data for learning Euclidean depth • Automatic ordinal depth labeling - Ford: if the area of C(of F) is larger than 5% of the image - Bord : if p’s C(of B) is larger than 5% of the image & p has a valid depth value that lies in the last quartile of the full range of depths for I Depth map refinement F: statues, fountains, people, cars B: building, towers, mountains, etc.
  • 6. 6  200 3D models from landmarks around the world  150K reconstructed images  After filtering: 130K valid images  Euclidean depth data: 100K images  Ordinal depth data: 30K images  Additional dataset: images from [18] Creating a dataset MegaDepth(MD) [18] Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction (siggraph2017)
  • 7. 7  Network architecture • VGG • ResNet • Hourglass network Depth estimation network hourglass network
  • 8. 8  Unknown scale factor: cannot compare predicted and ground truth depths directly  The ratio of pairs of depths are preserved under scaling.  In the log-depth domain, the difference between pairs of log-depth Scale-invariant Loss function Scale-invariant data term Muti-scale scale-invariant gradient matching term Robust ordinal depth loss
  • 9. 9  Scale-invariant data term Loss Function Predicted log-depth map GT log-depth map
  • 10. 10  Multi-scale scale-invariant gradient matching term Loss Function
  • 11. 11  Robust ordinal depth loss Loss Function i ∈ Ford , j ∈ Bord
  • 12. 12  Generalization • to new Internet photos from never-before-seen locations • to other types of images from other dataset  The effect of terms in our loss function  Experimental Setup • Test set: 46/200 (reconstructed models) • Training set / validation set: Randomly split 96%:4% (for 154 models) Evaluation
  • 13. 13  Error metircs • Si-RMSE: • SfM Disagreement Rate(SDR): Evaluation and ablation study on MD test set
  • 14. 14  Effect of loss network and loss variants Evaluation and ablation study on MD test set
  • 15. 15 Evaluation and ablation study on MD test set
  • 16. 16  Raw MD vs Clean MD Evaluation and ablation study on MD test set
  • 20. 20  Present a new use for Internet-derived SfM+MVS data  Generating large amounts of training data for single view depth prediction  generalizes very well to other datasets.  Limitations: • oblique surfaces (e.g., ground), thin or complex objects (e.g., lampposts), and difficult materials (e.g., shiny glass) • not predict metric depth Conclusion