Differentiable Ray Sampling for Neural 3D Representation

N. H. Shimada
Differentiable Ray Sampling  
for Neural 3D Representation 
Preferred Networks 2019 Research Internship

Single-view 3D reconstruction
・Grasping ・Autonomous driving
[Yan+ ICRA 2018] [Mapillary blog]

● 3D supervision
○ A large number of 3D datas are needed.
[Kato+ CVPR 2019]
Input
(image)
Output
(3D geometry)
prediction model

● 2D supervision
○ End-to-end training: only 2D images.
○ Differentiable renderer is needed.
[Kato+ CVPR 2019]
Input
(image)
prediction model
Rendering
3D geometry Output
(image)

● 3D Geometry representation
1. [Kato+ CVPR 2017]
2. [Tulsiani+ CVPR 2018]
3. [Sitzmann+ arXiv 2019]
Mesh1
Voxel2 Neural 3D
(SRN3
)
Neural 3D
(Ours)
initial shape ✕ ◯ ◯ ◯
memory
vs
resolution
◯ ✕ ◯ ◯
the number
of train views
◯ ◯ (✕) ◯
Accuracy
(IoU)
0.71 0.73 - ???

DRC (Tulsiani+ CVPR 2017)
Encoder
Decoder
Input
(image)
323
voxel
(occupancy)
Rendered
image

● Differentiable rendering

Input
(RGB) Input
(RGB)
Ground truth Prediction
Prediction

Ours
Voxel grid representation as function :
(xi
, yi
, zi
) → (Occupancy)
323
discrete input
Memory increases cubically with higher resolution
DRC (Tulsiani+ CVPR 2017) Our idea
x
y
z
Occupancy
Neural 3D representation : 
(x, y, z) → (Occupancy)
Continuous input
Constant memory with high resolution

Ours
● Differentiable ray sampling
d  Translation probability
Pixel value
in mask images
0 1

Ours
Encoder
Decoder
Input
(image)
Rendered
image
parameters
x
y
z
3D Networks

Results
● 1 instance Ground
truth
Prediction Diff
IoU
(DRC)
0.53
(0.43)
Voxelized 3D (sliced image)
{prediction, gt, diff}
0.81
(0.73)
Car
Chair

Results
● Multi-instance (Qualitative)
Ground
truth
Prediction Diff
Input
RGB
Car Chair

Results
● Multi-instance (Quantitative)
Accuracy
(IoU)
Voxel
(DRC1
)
Neural 3D
(Ours)
Car 0.73 0.72
Chair 0.43 0.44

Results
● Multi-instance (Loss plots)
Car Chair

SRN (Sitzmann+ NIPS 2019)
Encoder
Decoder
Input
(image)
Rendered
image
parameters
x
y
z
3D Networks
pixel generator
SDF (?)
di
d1
d2
d0
The part of rendering is also a networks.
→ 50 images per 1 object for training

Differentiable Ray Sampling for Neural 3D Representation

More Related Content

What's hot (20)

Similar to Differentiable Ray Sampling for Neural 3D Representation (9)

More from Preferred Networks (20)

Recently uploaded (20)

Differentiable Ray Sampling for Neural 3D Representation