SlideShare a Scribd company logo
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
NEURAL FIELDS IN COMPUTER VISION
Full-Day Tutorial, June 20th, 2022
neuralfields.cs.brown.edu/cvpr22
Reality Labs Research
Yiheng Xie Towaki Takikawa Shunsuke Saito Or Litany James Tompkin Vincent Sitzmann Srinath Sridhar
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Prior-based Reconstruction of
Neural Fields
2
Vincent Sitzmann
Assistant Professor, Scene Representation Group
www.scenerepresentations.com
www.vincentsitzmann.com
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Motivation: Novel View Synthesis
+
+
Observations
Image + Pose & Intrinsics
{ ,
,
…
{ Model
Novel Views
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Motivation: Novel View Synthesis
4
Fitting /
Optimization
Neural Scene
Representatio
n
Neural
Renderer
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Motivation: Novel View Synthesis
5
Inference
Neural Scene
Representatio
n
Neural
Renderer
Inference maps a set of observations to the parameters of a Neural Scene Representation.
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Overfitting case: Inference = Fitting via Gradient Descent
6
,…
+ }
{
REN D ER 𝜽
SDF + Color MLPs
SR N 𝝓
Fitting
Rendering
Normal map RGB
Sitzmann et al: Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations, NeurIPS 2020.
min REN D ER 𝜽(SR N 𝝓, 𝜉𝑖) − ℐ𝑖
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
DeepVoxels, CVPR 2018. NeRF, ECCV 2021
IDR, ECCV 2021 Plenoxels, CVPR 2022
SIREN, NeurIPS 2020
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
What if we have incomplete observations?
8
REN D ER 𝜽
SDF + Color MLPs
SR N 𝝓
Sitzmann et al: Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations, NeurIPS 2020.
min REN D ER 𝜽(SR N 𝝓, 𝜉𝑖) − ℐ𝑖
+
ℐ, 𝜉
No 3D inform.
Normal map RGB
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Inferring Neural Fields
9
Neural Scene
Representatio
n
Neural
Renderer
If only a single observation is available, or if only part of the scene has been observed,
Inference needs to be prior-based – i.e., we need to learn to reconstruct.
?
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
General Framework: Encoder-Decoder
10
Neural Scene
Representatio
n
Neural
Renderer
Decoder
Inference
Latent Variables {𝑧𝑖}𝑖=1
𝑁
Encoder
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
What are the latent variables?
11
Neural Scene
Representatio
n
Neural
Renderer
Inference
Encoder Latent Variables {𝑧𝑖}𝑖=1
𝑁
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
How to predict latent variables from observations?
12
Neural Scene
Representatio
n
Neural
Renderer
Inference
Encoder Latent Variables {𝑧𝑖}𝑖=1
𝑁
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
How do we decode latent variables into the Neural Field?
13
Neural Scene
Representatio
n
Neural
Renderer
Inference
Encoder Latent Variables {𝑧𝑖}𝑖=1
𝑁
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
What are the latent variables?
14
Neural Scene
Representatio
n
Neural
Renderer
Inference
Encoder Latent Variables {𝑧𝑖}𝑖=1
𝑁
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Key Consideration: Locality.
15
Neural Fields in Visual Computing and Beyond, Xie et al., EG STAR 2022
Global Conditioning Local Conditioning
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Global Latent Codes
16
Neural Fields in Visual Computing and Beyond, Xie et al., EG STAR 2022
Global Conditioning Local Conditioning
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Global conditioning
17
?
Latent code 𝑧
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Global conditioning
18
1[Schmidhuber et al. 1992, Schmidhuber et al. 1993, Stanley et al. 2009, Ha et al., 2016]
Hypernetwork1
Latent code 𝑧
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Global Latent Codes: Enables reconstruction from partial observations!
19
Scene Representation Networks: Continuous
3D-Structure-Aware Neural Scene Representations, NeurIPS 2019.
Differential Volumetric Rendering,
Niemeyer et al., CVPR 2020
DeepSDF, Occupancy Networks, IM-Net
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Global Latent Codes: Enables reconstruction from partial observations!
20
Scene Representation Networks: Continuous
3D-Structure-Aware Neural Scene Representations, NeurIPS 2019.
Differential Volumetric Rendering,
Niemeyer et al., CVPR 2020
DeepSDF, Occupancy Networks, IM-Net
Key limitation: Simple, non-compositional scenes.
But: Latent Space for full objects (interpolation etc)
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Local Latent Codes
21
Neural Fields in Visual Computing and Beyond, Xie et al., EG STAR 2022
Global Conditioning Local Conditioning
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
From point clouds: Conditioning on Feature Voxel grids
22
Convolutional Occupancy Networks [Peng et al. 2020]
Local Implicit Grid Representations for 3D Scenes [Jiang et al. 2020]
Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion [Chabra et al. 2020]
Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction [Chibane et al. 2020]
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
From point clouds: Conditioning on Feature Voxel grids
23
Convolutional Occupancy Networks [Peng et al. 2020]
Local Implicit Grid Representations for 3D Scenes [Jiang et al. 2020]
Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion [Chabra et al. 2020]
Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction [Chibane et al. 2020]
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
From point clouds: Conditioning on Feature Voxel grids
24
Generalizes to Compositional Scenes!
But: cubic memory complexity :/
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
From Point clouds: Ground-plan and Tri-plane factorizations
25
Convolutional Occupancy Networks [Peng et al. 2020]
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
From Point clouds: Ground-plan and Tri-plane factorizations
26
Convolutional Occupancy Networks [Peng et al. 2020]
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
From Point clouds: Ground-plan and Tri-plane factorizations
27
Convolutional Occupancy Networks [Peng et al. 2020]
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
From point clouds: Conditioning on Reconstructed Voxelgrids
28
5x less memory!
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
How to locally condition if sensor
domain different than field
domain?
29
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Local Conditioning: Pixel-Aligned Features.
30
PiFU, Saito et al., ICCV 2019.
PixelNeRF, Yu et al., CVPR 2021
Grf: Learning a general radiance field…, Trevithick et al.
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Local Conditioning: Pixel-Aligned Features.
31
PiFU, Saito et al., ICCV 2019.
PixelNeRF, Yu et al., CVPR 2021
Grf: Learning a general radiance field…, Trevithick et al.
Generalizes much better than global conditioning (like SRNs, DVR).
No persistent 3D representation.
All priors are learned in image space.
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Object-centric representations
32
CoLF: Unsupervised Learning of Compositional Object Light Fields, arXiv 2022.
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Object-centric representations
CoLF: Unsupervised Learning of Compositional
Object Light Fields, arXiv 2022.
uORF, ICLR 2022
Learns to disentangle objects self-supervised.
Inference of object-centric latent codes is hard problem.
Currently limited to relatively simple scenes, but progress is quick!
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Conditional Ground Plans for Single-Image 3D Reconstruction
34
Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement, Sharma et al. 2022
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Conditional Ground Plans for Single-Image 3D Reconstruction
35
Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement, Sharma et al. 2022
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Conditional Ground Plans for Single-Image 3D Reconstruction
36
Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement, Sharma et al. 2022
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
How to infer latent codes?
37
Neural Scene
Representatio
n
Neural
Renderer
Inference
Encoder Latent Variables {𝑧𝑖}𝑖=1
𝑁
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Encoding vs. Auto-Decoding
38
Neural Fields in Visual Computing and Beyond, Xie et al., EG STAR 2022
Encoding Auto-Decoding
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Auto-Decoding for inverse graphics
39
REN D ER
Latent code 𝑧0
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Auto-Decoding for inverse graphics
40
REN D ER
Latent code 𝑧0
𝑧 = arg min
𝑧
REN D ER (Φ) − ℐ
3D-structured, resolution-invariant!
Samples need not lie on regular
grids!
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Out-of-distribution generalization
41
3D structure enables generalization
to out-of-distribution camera poses!
𝑧 = arg min
𝑧
REN D ER 𝜽(SR N 𝜙=𝐻𝑁𝜓(𝑧), 𝜉) − ℐ
Reconstruction
CNN encoder
Input
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Other forms of Generalization: Transformer Decoders
42
AIR-Nets, Giebenhain et al. 2022
Scene Representation Transformer
Sajjadi et al. 2022
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Other forms of Generalization: Gradient-based meta-learning
Representation
In-the loop
specialization via gradient
descent
Meta-Representation
43
MetaSDF: Meta-learning Signed Distance Functions, NeurIPS 2020
Backpropagate through gradient-
descent inference at training time.
Learn initialization that explains
held-out observations when fit to
context observation.
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Inferring Neural Scene Representations
44
Inference
Neural Scene
Representatio
n
Neural
Renderer
Generalization enables reconstruction from incomplete observations.
Any other benefits?
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Problem: Forward map might be expensive!
45
Inference
Neural Scene
Representatio
n
Neural
Renderer
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
3D-structured Neural Scene Representations
: ℝ3 → ℝn
Hundreds of samples per ray.
Time- and memory-intensive training.
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
: ℝ3 → ℝn
[Adelson et al. 1991, Levoy et al. 1996, Gortler et al. 1996]
Light Field
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
: ℝ3 → ℝn
Light Field Networks
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
: ℝ3 → ℝn
Light Field Networks
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Light Field Networks
Conditioning
Plücker Coords.
An Alternative Scene Representation
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Rendering is learned / representation is “already rendered”
51
Inference
Neural Scene
Representatio
n
Neural
Renderer
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Rendering is learned / representation is “already rendered”
52
Inference
“Rendered” Neural Scene
Representation
More difficult inference problem, but more general renderer.
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Light Field Networks Volumetric Rendering (pixelNeRF)
500 FPS
1 evaluation per ray
0.033 FPS
196 evaluations per ray
Real-time. No post-processing, no discrete data structures (octrees, voxelgrids, …).
>100x reduction in memory: Can be trained on small GPUs!
15,000x speed
1,000x speed
100x speed
10x speed
1x speed
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Light Field Networks
500 FPS
1 evaluation per ray
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Light Fields with Transformers:
Scene Representation Transformer (CVPR 2022)
No 3D Renderer: Directly parameterizes Light
Field!
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
56
Things I didn’t talk about
● Generalization in 2D, 1D, etc. neural fields: Images, audio…
see LIIF (Chen et al. 2021), …
● Neural field-to-neural field translation, see Spatially-Adaptive
Pixelwise Networks for Fast Image Translation (Shaham et al.
2020)
● Generalization for robotics applications (see Neural Descriptor
Fields (Simeonov et al.), 3D neural scene … (Li et al., CoRL 2022),
Learning Multi-Object Dynamics... (Driess et al. 2022), …
● Generalization for structured field with known a-priori structure
(humans, hands, faces, etc)
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
57
Outlook
● Generalization gaining traction: Single-scene optimization too
limited.
● Opens up completely new ways of thinking about problems:
Can amortize otherwise expensive forward maps (light fields).
● Making progress on the question of compositionality w/ object-
centric and locally conditioned neural fields. More to come.
● Processing & inferring regular grids is easy. Harder for point clouds
/ factorized representations, etc.
● Transformers seem to learn a type of local conditioning, but more
research necessary.
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Prior-based Reconstruction of
Neural Fields
58
Vincent Sitzmann
Assistant Professor, Scene Representation Group
www.scenerepresentations.com
www.vincentsitzmann.com
Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
Q & A
Thanks!

More Related Content

PPTX
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
PDF
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PDF
Killzone Shadow Fall Demo Postmortem
PDF
文献紹介:CutDepth: Edge-aware Data Augmentation in Depth Estimation
PDF
Rendering Tech of Space Marine
PPTX
【DL輪読会】HexPlaneとK-Planes
PPT
Shadow mapping 정리
PDF
Neural Radiance Fields & Neural Rendering.pdf
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Killzone Shadow Fall Demo Postmortem
文献紹介:CutDepth: Edge-aware Data Augmentation in Depth Estimation
Rendering Tech of Space Marine
【DL輪読会】HexPlaneとK-Planes
Shadow mapping 정리
Neural Radiance Fields & Neural Rendering.pdf

What's hot (20)

PPTX
Physically Based and Unified Volumetric Rendering in Frostbite
PPTX
Super Resolution
PPTX
Depth estimation using deep learning
PPTX
Calibrating Lighting and Materials in Far Cry 3
PPTX
Image segmentation using wvlt trnsfrmtn and fuzzy logic. ppt
PDF
Optic flow estimation with deep learning
PDF
Cascade Shadow Mapping
PPTX
非参照型メトリクスを用いた放射線動画の評価
PPTX
【DL輪読会】DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Dri...
PDF
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PDF
Stereo vision
PPTX
Image segmentation
PDF
An Introduction to Optimal Transport
PDF
The Real-time Volumetric Cloudscapes of Horizon Zero Dawn
PPTX
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
PDF
Depth estimation do we need to throw old things away
PDF
Data-Centric AI開発における データ生成の取り組み
PPTX
SSII2014 チュートリアル資料
PDF
動作認識におけるディープラーニングの最新動向1 3D-CNN
PDF
Domain Transfer and Adaptation Survey
Physically Based and Unified Volumetric Rendering in Frostbite
Super Resolution
Depth estimation using deep learning
Calibrating Lighting and Materials in Far Cry 3
Image segmentation using wvlt trnsfrmtn and fuzzy logic. ppt
Optic flow estimation with deep learning
Cascade Shadow Mapping
非参照型メトリクスを用いた放射線動画の評価
【DL輪読会】DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Dri...
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Stereo vision
Image segmentation
An Introduction to Optimal Transport
The Real-time Volumetric Cloudscapes of Horizon Zero Dawn
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
Depth estimation do we need to throw old things away
Data-Centric AI開発における データ生成の取り組み
SSII2014 チュートリアル資料
動作認識におけるディープラーニングの最新動向1 3D-CNN
Domain Transfer and Adaptation Survey
Ad

Similar to Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fields in Computer Vision (20)

PPTX
Light Field Networks: Neural Scene Representations with Single-Evaluation Ren...
PPTX
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Rep...
PDF
Neural Radiance Field
PPTX
Transformer in Vision
PDF
Large Scale Image Retrieval 2022.pdf
PPTX
Scene Representation Networks(NIPS 2019)_OJung
PPTX
Semantic segmentation with Convolutional Neural Network Approaches
PDF
3Dshape Analysis Matching Ajmmmmmmmmmmmmm
PPTX
Review of MVSNet(2018)_250110_OJung.pptx
PPTX
Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shap...
PDF
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
PDF
PPTX
final ppt
PPTX
HR3D: Content Adaptive Parallax Barriers
PDF
AR/SLAM for end-users
PDF
Loihi many core_neuromorphic_chip
PDF
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
PDF
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
PDF
Spectral cnn
PDF
Visual odometry _report
Light Field Networks: Neural Scene Representations with Single-Evaluation Ren...
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Rep...
Neural Radiance Field
Transformer in Vision
Large Scale Image Retrieval 2022.pdf
Scene Representation Networks(NIPS 2019)_OJung
Semantic segmentation with Convolutional Neural Network Approaches
3Dshape Analysis Matching Ajmmmmmmmmmmmmm
Review of MVSNet(2018)_250110_OJung.pptx
Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shap...
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
final ppt
HR3D: Content Adaptive Parallax Barriers
AR/SLAM for end-users
Loihi many core_neuromorphic_chip
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Spectral cnn
Visual odometry _report
Ad

Recently uploaded (20)

PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Placing the Near-Earth Object Impact Probability in Context
PPT
protein biochemistry.ppt for university classes
PPTX
2. Earth - The Living Planet earth and life
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
Sciences of Europe No 170 (2025)
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
ECG_Course_Presentation د.محمد صقران ppt
Phytochemical Investigation of Miliusa longipes.pdf
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
HPLC-PPT.docx high performance liquid chromatography
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Placing the Near-Earth Object Impact Probability in Context
protein biochemistry.ppt for university classes
2. Earth - The Living Planet earth and life
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Sciences of Europe No 170 (2025)
Derivatives of integument scales, beaks, horns,.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
neck nodes and dissection types and lymph nodes levels
AlphaEarth Foundations and the Satellite Embedding dataset
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
POSITIONING IN OPERATION THEATRE ROOM.ppt
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...

Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fields in Computer Vision

  • 1. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields NEURAL FIELDS IN COMPUTER VISION Full-Day Tutorial, June 20th, 2022 neuralfields.cs.brown.edu/cvpr22 Reality Labs Research Yiheng Xie Towaki Takikawa Shunsuke Saito Or Litany James Tompkin Vincent Sitzmann Srinath Sridhar
  • 2. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Prior-based Reconstruction of Neural Fields 2 Vincent Sitzmann Assistant Professor, Scene Representation Group www.scenerepresentations.com www.vincentsitzmann.com
  • 3. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Motivation: Novel View Synthesis + + Observations Image + Pose & Intrinsics { , , … { Model Novel Views
  • 4. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Motivation: Novel View Synthesis 4 Fitting / Optimization Neural Scene Representatio n Neural Renderer
  • 5. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Motivation: Novel View Synthesis 5 Inference Neural Scene Representatio n Neural Renderer Inference maps a set of observations to the parameters of a Neural Scene Representation.
  • 6. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Overfitting case: Inference = Fitting via Gradient Descent 6 ,… + } { REN D ER 𝜽 SDF + Color MLPs SR N 𝝓 Fitting Rendering Normal map RGB Sitzmann et al: Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations, NeurIPS 2020. min REN D ER 𝜽(SR N 𝝓, 𝜉𝑖) − ℐ𝑖
  • 7. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields DeepVoxels, CVPR 2018. NeRF, ECCV 2021 IDR, ECCV 2021 Plenoxels, CVPR 2022 SIREN, NeurIPS 2020
  • 8. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields What if we have incomplete observations? 8 REN D ER 𝜽 SDF + Color MLPs SR N 𝝓 Sitzmann et al: Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations, NeurIPS 2020. min REN D ER 𝜽(SR N 𝝓, 𝜉𝑖) − ℐ𝑖 + ℐ, 𝜉 No 3D inform. Normal map RGB
  • 9. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Inferring Neural Fields 9 Neural Scene Representatio n Neural Renderer If only a single observation is available, or if only part of the scene has been observed, Inference needs to be prior-based – i.e., we need to learn to reconstruct. ?
  • 10. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields General Framework: Encoder-Decoder 10 Neural Scene Representatio n Neural Renderer Decoder Inference Latent Variables {𝑧𝑖}𝑖=1 𝑁 Encoder
  • 11. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields What are the latent variables? 11 Neural Scene Representatio n Neural Renderer Inference Encoder Latent Variables {𝑧𝑖}𝑖=1 𝑁
  • 12. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields How to predict latent variables from observations? 12 Neural Scene Representatio n Neural Renderer Inference Encoder Latent Variables {𝑧𝑖}𝑖=1 𝑁
  • 13. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields How do we decode latent variables into the Neural Field? 13 Neural Scene Representatio n Neural Renderer Inference Encoder Latent Variables {𝑧𝑖}𝑖=1 𝑁
  • 14. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields What are the latent variables? 14 Neural Scene Representatio n Neural Renderer Inference Encoder Latent Variables {𝑧𝑖}𝑖=1 𝑁
  • 15. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Key Consideration: Locality. 15 Neural Fields in Visual Computing and Beyond, Xie et al., EG STAR 2022 Global Conditioning Local Conditioning
  • 16. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Global Latent Codes 16 Neural Fields in Visual Computing and Beyond, Xie et al., EG STAR 2022 Global Conditioning Local Conditioning
  • 17. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Global conditioning 17 ? Latent code 𝑧
  • 18. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Global conditioning 18 1[Schmidhuber et al. 1992, Schmidhuber et al. 1993, Stanley et al. 2009, Ha et al., 2016] Hypernetwork1 Latent code 𝑧
  • 19. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Global Latent Codes: Enables reconstruction from partial observations! 19 Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations, NeurIPS 2019. Differential Volumetric Rendering, Niemeyer et al., CVPR 2020 DeepSDF, Occupancy Networks, IM-Net
  • 20. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Global Latent Codes: Enables reconstruction from partial observations! 20 Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations, NeurIPS 2019. Differential Volumetric Rendering, Niemeyer et al., CVPR 2020 DeepSDF, Occupancy Networks, IM-Net Key limitation: Simple, non-compositional scenes. But: Latent Space for full objects (interpolation etc)
  • 21. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Local Latent Codes 21 Neural Fields in Visual Computing and Beyond, Xie et al., EG STAR 2022 Global Conditioning Local Conditioning
  • 22. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields From point clouds: Conditioning on Feature Voxel grids 22 Convolutional Occupancy Networks [Peng et al. 2020] Local Implicit Grid Representations for 3D Scenes [Jiang et al. 2020] Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion [Chabra et al. 2020] Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction [Chibane et al. 2020]
  • 23. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields From point clouds: Conditioning on Feature Voxel grids 23 Convolutional Occupancy Networks [Peng et al. 2020] Local Implicit Grid Representations for 3D Scenes [Jiang et al. 2020] Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion [Chabra et al. 2020] Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction [Chibane et al. 2020]
  • 24. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields From point clouds: Conditioning on Feature Voxel grids 24 Generalizes to Compositional Scenes! But: cubic memory complexity :/
  • 25. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields From Point clouds: Ground-plan and Tri-plane factorizations 25 Convolutional Occupancy Networks [Peng et al. 2020]
  • 26. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields From Point clouds: Ground-plan and Tri-plane factorizations 26 Convolutional Occupancy Networks [Peng et al. 2020]
  • 27. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields From Point clouds: Ground-plan and Tri-plane factorizations 27 Convolutional Occupancy Networks [Peng et al. 2020]
  • 28. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields From point clouds: Conditioning on Reconstructed Voxelgrids 28 5x less memory!
  • 29. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields How to locally condition if sensor domain different than field domain? 29
  • 30. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Local Conditioning: Pixel-Aligned Features. 30 PiFU, Saito et al., ICCV 2019. PixelNeRF, Yu et al., CVPR 2021 Grf: Learning a general radiance field…, Trevithick et al.
  • 31. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Local Conditioning: Pixel-Aligned Features. 31 PiFU, Saito et al., ICCV 2019. PixelNeRF, Yu et al., CVPR 2021 Grf: Learning a general radiance field…, Trevithick et al. Generalizes much better than global conditioning (like SRNs, DVR). No persistent 3D representation. All priors are learned in image space.
  • 32. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Object-centric representations 32 CoLF: Unsupervised Learning of Compositional Object Light Fields, arXiv 2022.
  • 33. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Object-centric representations CoLF: Unsupervised Learning of Compositional Object Light Fields, arXiv 2022. uORF, ICLR 2022 Learns to disentangle objects self-supervised. Inference of object-centric latent codes is hard problem. Currently limited to relatively simple scenes, but progress is quick!
  • 34. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Conditional Ground Plans for Single-Image 3D Reconstruction 34 Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement, Sharma et al. 2022
  • 35. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Conditional Ground Plans for Single-Image 3D Reconstruction 35 Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement, Sharma et al. 2022
  • 36. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Conditional Ground Plans for Single-Image 3D Reconstruction 36 Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement, Sharma et al. 2022
  • 37. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields How to infer latent codes? 37 Neural Scene Representatio n Neural Renderer Inference Encoder Latent Variables {𝑧𝑖}𝑖=1 𝑁
  • 38. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Encoding vs. Auto-Decoding 38 Neural Fields in Visual Computing and Beyond, Xie et al., EG STAR 2022 Encoding Auto-Decoding
  • 39. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Auto-Decoding for inverse graphics 39 REN D ER Latent code 𝑧0
  • 40. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Auto-Decoding for inverse graphics 40 REN D ER Latent code 𝑧0 𝑧 = arg min 𝑧 REN D ER (Φ) − ℐ 3D-structured, resolution-invariant! Samples need not lie on regular grids!
  • 41. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Out-of-distribution generalization 41 3D structure enables generalization to out-of-distribution camera poses! 𝑧 = arg min 𝑧 REN D ER 𝜽(SR N 𝜙=𝐻𝑁𝜓(𝑧), 𝜉) − ℐ Reconstruction CNN encoder Input
  • 42. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Other forms of Generalization: Transformer Decoders 42 AIR-Nets, Giebenhain et al. 2022 Scene Representation Transformer Sajjadi et al. 2022
  • 43. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Other forms of Generalization: Gradient-based meta-learning Representation In-the loop specialization via gradient descent Meta-Representation 43 MetaSDF: Meta-learning Signed Distance Functions, NeurIPS 2020 Backpropagate through gradient- descent inference at training time. Learn initialization that explains held-out observations when fit to context observation.
  • 44. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Inferring Neural Scene Representations 44 Inference Neural Scene Representatio n Neural Renderer Generalization enables reconstruction from incomplete observations. Any other benefits?
  • 45. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Problem: Forward map might be expensive! 45 Inference Neural Scene Representatio n Neural Renderer
  • 46. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields 3D-structured Neural Scene Representations : ℝ3 → ℝn Hundreds of samples per ray. Time- and memory-intensive training.
  • 47. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields : ℝ3 → ℝn [Adelson et al. 1991, Levoy et al. 1996, Gortler et al. 1996] Light Field
  • 48. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields : ℝ3 → ℝn Light Field Networks
  • 49. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields : ℝ3 → ℝn Light Field Networks
  • 50. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Light Field Networks Conditioning Plücker Coords. An Alternative Scene Representation
  • 51. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Rendering is learned / representation is “already rendered” 51 Inference Neural Scene Representatio n Neural Renderer
  • 52. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Rendering is learned / representation is “already rendered” 52 Inference “Rendered” Neural Scene Representation More difficult inference problem, but more general renderer.
  • 53. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Light Field Networks Volumetric Rendering (pixelNeRF) 500 FPS 1 evaluation per ray 0.033 FPS 196 evaluations per ray Real-time. No post-processing, no discrete data structures (octrees, voxelgrids, …). >100x reduction in memory: Can be trained on small GPUs! 15,000x speed 1,000x speed 100x speed 10x speed 1x speed
  • 54. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Light Field Networks 500 FPS 1 evaluation per ray
  • 55. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Light Fields with Transformers: Scene Representation Transformer (CVPR 2022) No 3D Renderer: Directly parameterizes Light Field!
  • 56. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields 56 Things I didn’t talk about ● Generalization in 2D, 1D, etc. neural fields: Images, audio… see LIIF (Chen et al. 2021), … ● Neural field-to-neural field translation, see Spatially-Adaptive Pixelwise Networks for Fast Image Translation (Shaham et al. 2020) ● Generalization for robotics applications (see Neural Descriptor Fields (Simeonov et al.), 3D neural scene … (Li et al., CoRL 2022), Learning Multi-Object Dynamics... (Driess et al. 2022), … ● Generalization for structured field with known a-priori structure (humans, hands, faces, etc)
  • 57. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields 57 Outlook ● Generalization gaining traction: Single-scene optimization too limited. ● Opens up completely new ways of thinking about problems: Can amortize otherwise expensive forward maps (light fields). ● Making progress on the question of compositionality w/ object- centric and locally conditioned neural fields. More to come. ● Processing & inferring regular grids is easy. Harder for point clouds / factorized representations, etc. ● Transformers seem to learn a type of local conditioning, but more research necessary.
  • 58. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Prior-based Reconstruction of Neural Fields 58 Vincent Sitzmann Assistant Professor, Scene Representation Group www.scenerepresentations.com www.vincentsitzmann.com
  • 59. Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields Q & A Thanks!