Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fields in Computer Vision

Vincent Sitzmann, CVPR 2022 Tutorial on Neural Fields
NEURAL FIELDS IN COMPUTER VISION
Full-Day Tutorial, June 20th, 2022
neuralfields.cs.brown.edu/cvpr22
Reality Labs Research
Yiheng Xie Towaki Takikawa Shunsuke Saito Or Litany James Tompkin Vincent Sitzmann Srinath Sridhar

Prior-based Reconstruction of
Neural Fields
2
Vincent Sitzmann
Assistant Professor, Scene Representation Group
www.scenerepresentations.com
www.vincentsitzmann.com

Motivation: Novel View Synthesis
+
+
Observations
Image + Pose & Intrinsics
{ ,
,
…
{ Model
Novel Views

4
Fitting /
Optimization
Neural Scene
Representatio
n
Neural
Renderer

5
Inference
Neural Scene
Representatio
n
Neural
Renderer
Inference maps a set of observations to the parameters of a Neural Scene Representation.

Overfitting case: Inference = Fitting via Gradient Descent
6
,…
+ }
{
REN D ER 𝜽
SDF + Color MLPs
SR N 𝝓
Fitting
Rendering
Normal map RGB
Sitzmann et al: Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations, NeurIPS 2020.
min REN D ER 𝜽(SR N 𝝓, 𝜉𝑖) − ℐ𝑖

DeepVoxels, CVPR 2018. NeRF, ECCV 2021
IDR, ECCV 2021 Plenoxels, CVPR 2022
SIREN, NeurIPS 2020

What if we have incomplete observations?
8
REN D ER 𝜽
SDF + Color MLPs
SR N 𝝓
Sitzmann et al: Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations, NeurIPS 2020.
min REN D ER 𝜽(SR N 𝝓, 𝜉𝑖) − ℐ𝑖
+
ℐ, 𝜉
No 3D inform.
Normal map RGB

Inferring Neural Fields
9
Neural Scene
Representatio
n
Neural
Renderer
If only a single observation is available, or if only part of the scene has been observed,
Inference needs to be prior-based – i.e., we need to learn to reconstruct.
?

General Framework: Encoder-Decoder
10
Neural Scene
Representatio
n
Neural
Renderer
Decoder
Inference
Latent Variables {𝑧𝑖}𝑖=1
𝑁
Encoder

What are the latent variables?
11
Neural Scene
Representatio
n
Neural
Renderer
Inference
Encoder Latent Variables {𝑧𝑖}𝑖=1
𝑁

How to predict latent variables from observations?
12
Neural Scene
Representatio
n
Neural
Renderer
Inference
𝑁

How do we decode latent variables into the Neural Field?
13
Neural Scene
Representatio
n
Neural
Renderer
Inference
𝑁

What are the latent variables?
14
Neural Scene
Representatio
n
Neural
Renderer
Inference
𝑁

Key Consideration: Locality.
15
Neural Fields in Visual Computing and Beyond, Xie et al., EG STAR 2022
Global Conditioning Local Conditioning

Global Latent Codes
16

Global conditioning
17
?
Latent code 𝑧

Global conditioning
18
1[Schmidhuber et al. 1992, Schmidhuber et al. 1993, Stanley et al. 2009, Ha et al., 2016]
Hypernetwork1
Latent code 𝑧

Global Latent Codes: Enables reconstruction from partial observations!
19
Scene Representation Networks: Continuous
3D-Structure-Aware Neural Scene Representations, NeurIPS 2019.
Differential Volumetric Rendering,
Niemeyer et al., CVPR 2020
DeepSDF, Occupancy Networks, IM-Net

Global Latent Codes: Enables reconstruction from partial observations!
20
Scene Representation Networks: Continuous
3D-Structure-Aware Neural Scene Representations, NeurIPS 2019.
Differential Volumetric Rendering,
Niemeyer et al., CVPR 2020
DeepSDF, Occupancy Networks, IM-Net
Key limitation: Simple, non-compositional scenes.
But: Latent Space for full objects (interpolation etc)

Local Latent Codes
21

From point clouds: Conditioning on Feature Voxel grids
22
Convolutional Occupancy Networks [Peng et al. 2020]
Local Implicit Grid Representations for 3D Scenes [Jiang et al. 2020]
Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion [Chabra et al. 2020]
Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction [Chibane et al. 2020]

23
Local Implicit Grid Representations for 3D Scenes [Jiang et al. 2020]
Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion [Chabra et al. 2020]
Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction [Chibane et al. 2020]

24
Generalizes to Compositional Scenes!
But: cubic memory complexity :/

From Point clouds: Ground-plan and Tri-plane factorizations
25

26

27

From point clouds: Conditioning on Reconstructed Voxelgrids
28
5x less memory!

How to locally condition if sensor
domain different than field
domain?
29

Local Conditioning: Pixel-Aligned Features.
30
PiFU, Saito et al., ICCV 2019.
PixelNeRF, Yu et al., CVPR 2021
Grf: Learning a general radiance field…, Trevithick et al.

Local Conditioning: Pixel-Aligned Features.
31
PiFU, Saito et al., ICCV 2019.
PixelNeRF, Yu et al., CVPR 2021
Grf: Learning a general radiance field…, Trevithick et al.
Generalizes much better than global conditioning (like SRNs, DVR).
No persistent 3D representation.
All priors are learned in image space.

Object-centric representations
32
CoLF: Unsupervised Learning of Compositional Object Light Fields, arXiv 2022.

Object-centric representations
CoLF: Unsupervised Learning of Compositional
Object Light Fields, arXiv 2022.
uORF, ICLR 2022
Learns to disentangle objects self-supervised.
Inference of object-centric latent codes is hard problem.
Currently limited to relatively simple scenes, but progress is quick!

Conditional Ground Plans for Single-Image 3D Reconstruction
34
Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement, Sharma et al. 2022

35

36

How to infer latent codes?
37
Neural Scene
Representatio
n
Neural
Renderer
Inference
𝑁

Encoding vs. Auto-Decoding
38
Encoding Auto-Decoding

Auto-Decoding for inverse graphics
39
REN D ER
Latent code 𝑧0

Auto-Decoding for inverse graphics
40
REN D ER
Latent code 𝑧0
𝑧 = arg min
𝑧
REN D ER (Φ) − ℐ
3D-structured, resolution-invariant!
Samples need not lie on regular
grids!

Out-of-distribution generalization
41
3D structure enables generalization
to out-of-distribution camera poses!
𝑧 = arg min
𝑧
REN D ER 𝜽(SR N 𝜙=𝐻𝑁𝜓(𝑧), 𝜉) − ℐ
Reconstruction
CNN encoder
Input

Other forms of Generalization: Transformer Decoders
42
AIR-Nets, Giebenhain et al. 2022
Scene Representation Transformer
Sajjadi et al. 2022

Other forms of Generalization: Gradient-based meta-learning
Representation
In-the loop
specialization via gradient
descent
Meta-Representation
43
MetaSDF: Meta-learning Signed Distance Functions, NeurIPS 2020
Backpropagate through gradient-
descent inference at training time.
Learn initialization that explains
held-out observations when fit to
context observation.

Inferring Neural Scene Representations
44
Inference
Neural Scene
Representatio
n
Neural
Renderer
Generalization enables reconstruction from incomplete observations.
Any other benefits?

Problem: Forward map might be expensive!
45
Inference
Neural Scene
Representatio
n
Neural
Renderer

3D-structured Neural Scene Representations
: ℝ3 → ℝn
Hundreds of samples per ray.
Time- and memory-intensive training.

: ℝ3 → ℝn
[Adelson et al. 1991, Levoy et al. 1996, Gortler et al. 1996]
Light Field

: ℝ3 → ℝn
Light Field Networks

Conditioning
Plücker Coords.
An Alternative Scene Representation

Rendering is learned / representation is “already rendered”
51
Inference
Neural Scene
Representatio
n
Neural
Renderer

Rendering is learned / representation is “already rendered”
52
Inference
“Rendered” Neural Scene
Representation
More difficult inference problem, but more general renderer.

Light Field Networks Volumetric Rendering (pixelNeRF)
500 FPS
1 evaluation per ray
0.033 FPS
196 evaluations per ray
Real-time. No post-processing, no discrete data structures (octrees, voxelgrids, …).
>100x reduction in memory: Can be trained on small GPUs!
15,000x speed
1,000x speed
100x speed
10x speed
1x speed

500 FPS
1 evaluation per ray

Light Fields with Transformers:
Scene Representation Transformer (CVPR 2022)
No 3D Renderer: Directly parameterizes Light
Field!

56
Things I didn’t talk about
● Generalization in 2D, 1D, etc. neural fields: Images, audio…
see LIIF (Chen et al. 2021), …
● Neural field-to-neural field translation, see Spatially-Adaptive
Pixelwise Networks for Fast Image Translation (Shaham et al.
2020)
● Generalization for robotics applications (see Neural Descriptor
Fields (Simeonov et al.), 3D neural scene … (Li et al., CoRL 2022),
Learning Multi-Object Dynamics... (Driess et al. 2022), …
● Generalization for structured field with known a-priori structure
(humans, hands, faces, etc)

57
Outlook
● Generalization gaining traction: Single-scene optimization too
limited.
● Opens up completely new ways of thinking about problems:
Can amortize otherwise expensive forward maps (light fields).
● Making progress on the question of compositionality w/ object-
centric and locally conditioned neural fields. More to come.
● Processing & inferring regular grids is easy. Harder for point clouds
/ factorized representations, etc.
● Transformers seem to learn a type of local conditioning, but more
research necessary.

Prior-based Reconstruction of
Neural Fields
58
Vincent Sitzmann
Assistant Professor, Scene Representation Group
www.scenerepresentations.com
www.vincentsitzmann.com

Q & A
Thanks!

Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fields in Computer Vision

More Related Content

What's hot (20)

Similar to Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fields in Computer Vision (20)

Recently uploaded (20)

Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fields in Computer Vision