SlideShare a Scribd company logo
Deep Single-View 3D Object Reconstruction with
Visual Hull Embedding
Hanqing Wang, Jiaolong Yang, Wei Liang, Xin Tong
Beijing Institute of Technology Microsoft Research Asia
Beijing, China Beijing, China
AAAI 2019
1,2 2 1 2
1 2
• Input: a single RGB(D) Image
• Output: the corresponding 3D representation
Single-View 3D Reconstruction
Single-View 3D Reconstruction
• Deep Learning based Methods:
[Girdhar ECCV’16]
[Choy ECCV’16]
Other works:
Yan NIPS’16; Wu NIPS’16; Tulsiani CVPR’17; Zhu ICCV’17…
Single-View 3D Reconstruction
• Problems of Existing Deep Learning based Methods:
• 1. Arbitrary-view images vs. Canonical-view aligned 3D shapes
• 2. Unsatisfactory results
• Missing shape details
• Plausible shapes yet inconsistent with input images
11/15/2018 4
Generation or
Reconstruction???
Z
Y
X
• Goal: Reconstruct the object precisely with the given image
• Idea: Embed explicitly the 3D-2D projection geometry into a network
• Approach: Estimating a single-view visual hull inside of the network
Multi-
view
Visual Hull
Single-view
Visual Hull
Core Idea
Our Approach
• Perspective camera model
• Volumetric shape representation
• Method overview
Components
(R,T)
2D Encoder
Regressor2D Encoder 3D Decoder 2D Decoder
3D Decoder
+
3D Encoder
(a)
(d)
(b) (c)
(e)
• (a) V-Net: coarse shape prediction
• (b) P-Net: object pose and camera parameters estimation
• (c) S-Net: silhouette prediction
• (d) PSVH layer: visual hull generation
• (e) R-Net: coarse shape refinement
Projection Details
The relationship between a 3D point (𝑋, 𝑌, 𝑍) and its projected pixel location (𝑢, 𝑣) on the
image is
(1)
Where the camera intrinsic matrix , is the rotation matrix
generated by three Euler angles, noted as , is the
translation vector. For translation we estimate 𝑡 𝑍 and a 2D vector [𝑡 𝑢, 𝑡 𝑣] which centralizes the
object on image plane, and obtain 𝑡 via
𝑡 𝑢
𝑓
∗ 𝑡 𝑍,
𝑡 𝑣
𝑓
∗ 𝑡 𝑍, 𝑡 𝑍
𝑇
.
In summary, we parameterize the pose as a 6-D vector
𝑍 𝑢, 𝑣, 1 𝑇
= K(R 𝑋, 𝑌, 𝑍 𝑇
+ 𝑡)
K =
𝑓 0 𝑢0
0 𝑓 𝑣0
0 0 1
R ∈ SO(3)
𝑡 = 𝑡 𝑋, 𝑡 𝑌, 𝑡 𝑧
𝑇 ∈ ℝ3[𝜃1, 𝜃2, 𝜃3]
𝑝 = 𝜃1, 𝜃2, 𝜃3, 𝑡 𝑢, 𝑡 𝑣, 𝑡 𝑧
𝑇
Network Architecture
• Overview:
Training Loss
We use the binary cross-entropy loss to train V-Net, S-Net and R-Net, let 𝑝 𝑛 be the estimated
probability at location 𝑛, the loss is defined as
(2)
Where 𝑝 𝑛
∗
is the target probability
For P-Net, we use the 𝐿1 regression loss to train the network:
(3)
where we set 𝛼 = 1, 𝛾 = 1, 𝛽 = 0.01
𝑙 = −
1
𝑁
෍
𝑛
(𝑝 𝑛
∗ log 𝑝 𝑛 + 1 − 𝑝 𝑛
∗ log(1 − 𝑝 𝑛))
𝑙 = ෍
𝑖=1,2,3
𝛼 𝜃𝑖 − 𝜃𝑖
∗
+ ෍
𝑗=𝑢,𝑣
𝛽 𝑡𝑗 − 𝑡𝑗
∗
+ 𝛾 𝑡 𝑍 − 𝑡 𝑍
∗
• Object categories: car, airplane, chair, sofa
• Datasets:
• 3D-R2N2 dataset – rendered ShapeNet objects
• PASCAL 3D+ dataset – real images manfully associated with limited CAD models
Experiments
Experiments
• Implementation details:
• Network implemented in Tensorflow
• Input image size: 128x128x3
• Output voxel grid: 32x32x32
• Running time:
• ~18ms for one image (i.e. running at 55 fps)
• (Tested with a batch of 24 images on a NVIDIA Tesla M40 GPU)
Experiments
• Results on the 3D-R2N2 dataset (rendered ShapeNet objects)
• Ablation study:
Experiments
• Results on the 3D-R2N2 dataset (rendered ShapeNet objects)
• Ablation study:
Experiments
• Results on the 3D-R2N2 dataset (rendered ShapeNet objects)
• Results on the PASCAL 3D+ dataset (real images)
Experiments
Summary
• A novel 3D reconstruction neural network structure
• Embedding Domain knowledge (3D-2D perspective geometry) into a DNN
• Performing reconstruction jointly with segmentation and pose estimation
• A novel, GPU-friendly Probabilistic Single-view Visual Hull layer

More Related Content

PPTX
CCD and CMOS Image Sensor
PPT
Image segmentation
PPTX
3D Image visualization
PPTX
Computer Vision.pptx
PPT
Image segmentation
PPTX
Analogue electronics lec (2)
PPTX
Lect 03 - first portion
PPTX
Zener diode as a voltage Regulator
CCD and CMOS Image Sensor
Image segmentation
3D Image visualization
Computer Vision.pptx
Image segmentation
Analogue electronics lec (2)
Lect 03 - first portion
Zener diode as a voltage Regulator

What's hot (20)

PDF
OpenCV Introduction
PDF
Design of animation sequence
PPTX
Digital image processing
PPTX
Dip unit-i-ppt academic year(2016-17)
PDF
IndirectDraw with unity
PPTX
Digital image processing
PPTX
Digital image processing
PPTX
Image segmentation
PPTX
Sobel Edge Detection Using FPGA
PPTX
OPAMP integrator & differentiator.pptx
PPTX
Hidden surface removal
PDF
Photometry and radiometry
PPTX
Image processing
PPTX
Image Compression
PPT
EST 130, Transistor Biasing and Amplification.
PPTX
(Full MatLab Code) Image compression DCT
PPTX
zener diode presentation
ODP
Image Processing with OpenCV
PPTX
Digital image processing
OpenCV Introduction
Design of animation sequence
Digital image processing
Dip unit-i-ppt academic year(2016-17)
IndirectDraw with unity
Digital image processing
Digital image processing
Image segmentation
Sobel Edge Detection Using FPGA
OPAMP integrator & differentiator.pptx
Hidden surface removal
Photometry and radiometry
Image processing
Image Compression
EST 130, Transistor Biasing and Amplification.
(Full MatLab Code) Image compression DCT
zener diode presentation
Image Processing with OpenCV
Digital image processing
Ad

Similar to Deep single view 3 d object reconstruction with visual hull (20)

PDF
3Dshape Analysis Matching Ajmmmmmmmmmmmmm
PPTX
[NS][Lab_Seminar_240805]CheckerPose.pptx
PDF
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
PPTX
Semantic Segmentation on Satellite Imagery
PDF
V2 v posenet
PDF
Introduction to 3D Computer Vision and Differentiable Rendering
PPTX
[DL輪読会]ClearGrasp
PDF
Weakly supervised semantic segmentation of 3D point cloud
PPTX
Introduction to object detection
PPTX
Overview of Computer Graphics
PPTX
slides (1).pptx
PDF
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
PDF
Pose estimation from RGB images by deep learning
PDF
Low Light Image Enhancement Using Zero-DCE algorithm
PDF
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
PPTX
Attentive Relational Networks for Mapping Images to Scene Graphs
PPT
Unit I-cg.ppt Introduction to Computer Graphics elements
PPT
Introduction to Computer Graphics elements
PPT
Introduction to Computer Graphics computer
3Dshape Analysis Matching Ajmmmmmmmmmmmmm
[NS][Lab_Seminar_240805]CheckerPose.pptx
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
Semantic Segmentation on Satellite Imagery
V2 v posenet
Introduction to 3D Computer Vision and Differentiable Rendering
[DL輪読会]ClearGrasp
Weakly supervised semantic segmentation of 3D point cloud
Introduction to object detection
Overview of Computer Graphics
slides (1).pptx
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
Pose estimation from RGB images by deep learning
Low Light Image Enhancement Using Zero-DCE algorithm
[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...
Attentive Relational Networks for Mapping Images to Scene Graphs
Unit I-cg.ppt Introduction to Computer Graphics elements
Introduction to Computer Graphics elements
Introduction to Computer Graphics computer
Ad

Recently uploaded (20)

DOC
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
PPTX
Anesthesia and it's stage with mnemonic and images
PPTX
Human Mind & its character Characteristics
PPTX
Project and change Managment: short video sequences for IBA
PPTX
MERISTEMATIC TISSUES (MERISTEMS) PPT PUBLIC
PPTX
2025-08-10 Joseph 02 (shared slides).pptx
PPTX
chapter8-180915055454bycuufucdghrwtrt.pptx
PPTX
Emphasizing It's Not The End 08 06 2025.pptx
PPTX
Self management and self evaluation presentation
PPTX
Introduction-to-Food-Packaging-and-packaging -materials.pptx
PDF
COLEAD A2F approach and Theory of Change
PPTX
ART-APP-REPORT-FINctrwxsg f fuy L-na.pptx
PPTX
An Unlikely Response 08 10 2025.pptx
PDF
Swiggy’s Playbook: UX, Logistics & Monetization
DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
PPTX
Primary and secondary sources, and history
PPTX
The Effect of Human Resource Management Practice on Organizational Performanc...
PDF
Instagram's Product Secrets Unveiled with this PPT
PPTX
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
PPTX
Sustainable Forest Management ..SFM.pptx
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
Anesthesia and it's stage with mnemonic and images
Human Mind & its character Characteristics
Project and change Managment: short video sequences for IBA
MERISTEMATIC TISSUES (MERISTEMS) PPT PUBLIC
2025-08-10 Joseph 02 (shared slides).pptx
chapter8-180915055454bycuufucdghrwtrt.pptx
Emphasizing It's Not The End 08 06 2025.pptx
Self management and self evaluation presentation
Introduction-to-Food-Packaging-and-packaging -materials.pptx
COLEAD A2F approach and Theory of Change
ART-APP-REPORT-FINctrwxsg f fuy L-na.pptx
An Unlikely Response 08 10 2025.pptx
Swiggy’s Playbook: UX, Logistics & Monetization
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
Primary and secondary sources, and history
The Effect of Human Resource Management Practice on Organizational Performanc...
Instagram's Product Secrets Unveiled with this PPT
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
Sustainable Forest Management ..SFM.pptx

Deep single view 3 d object reconstruction with visual hull

  • 1. Deep Single-View 3D Object Reconstruction with Visual Hull Embedding Hanqing Wang, Jiaolong Yang, Wei Liang, Xin Tong Beijing Institute of Technology Microsoft Research Asia Beijing, China Beijing, China AAAI 2019 1,2 2 1 2 1 2
  • 2. • Input: a single RGB(D) Image • Output: the corresponding 3D representation Single-View 3D Reconstruction
  • 3. Single-View 3D Reconstruction • Deep Learning based Methods: [Girdhar ECCV’16] [Choy ECCV’16] Other works: Yan NIPS’16; Wu NIPS’16; Tulsiani CVPR’17; Zhu ICCV’17…
  • 4. Single-View 3D Reconstruction • Problems of Existing Deep Learning based Methods: • 1. Arbitrary-view images vs. Canonical-view aligned 3D shapes • 2. Unsatisfactory results • Missing shape details • Plausible shapes yet inconsistent with input images 11/15/2018 4 Generation or Reconstruction??? Z Y X
  • 5. • Goal: Reconstruct the object precisely with the given image • Idea: Embed explicitly the 3D-2D projection geometry into a network • Approach: Estimating a single-view visual hull inside of the network Multi- view Visual Hull Single-view Visual Hull Core Idea
  • 6. Our Approach • Perspective camera model • Volumetric shape representation • Method overview
  • 7. Components (R,T) 2D Encoder Regressor2D Encoder 3D Decoder 2D Decoder 3D Decoder + 3D Encoder (a) (d) (b) (c) (e) • (a) V-Net: coarse shape prediction • (b) P-Net: object pose and camera parameters estimation • (c) S-Net: silhouette prediction • (d) PSVH layer: visual hull generation • (e) R-Net: coarse shape refinement
  • 8. Projection Details The relationship between a 3D point (𝑋, 𝑌, 𝑍) and its projected pixel location (𝑢, 𝑣) on the image is (1) Where the camera intrinsic matrix , is the rotation matrix generated by three Euler angles, noted as , is the translation vector. For translation we estimate 𝑡 𝑍 and a 2D vector [𝑡 𝑢, 𝑡 𝑣] which centralizes the object on image plane, and obtain 𝑡 via 𝑡 𝑢 𝑓 ∗ 𝑡 𝑍, 𝑡 𝑣 𝑓 ∗ 𝑡 𝑍, 𝑡 𝑍 𝑇 . In summary, we parameterize the pose as a 6-D vector 𝑍 𝑢, 𝑣, 1 𝑇 = K(R 𝑋, 𝑌, 𝑍 𝑇 + 𝑡) K = 𝑓 0 𝑢0 0 𝑓 𝑣0 0 0 1 R ∈ SO(3) 𝑡 = 𝑡 𝑋, 𝑡 𝑌, 𝑡 𝑧 𝑇 ∈ ℝ3[𝜃1, 𝜃2, 𝜃3] 𝑝 = 𝜃1, 𝜃2, 𝜃3, 𝑡 𝑢, 𝑡 𝑣, 𝑡 𝑧 𝑇
  • 10. Training Loss We use the binary cross-entropy loss to train V-Net, S-Net and R-Net, let 𝑝 𝑛 be the estimated probability at location 𝑛, the loss is defined as (2) Where 𝑝 𝑛 ∗ is the target probability For P-Net, we use the 𝐿1 regression loss to train the network: (3) where we set 𝛼 = 1, 𝛾 = 1, 𝛽 = 0.01 𝑙 = − 1 𝑁 ෍ 𝑛 (𝑝 𝑛 ∗ log 𝑝 𝑛 + 1 − 𝑝 𝑛 ∗ log(1 − 𝑝 𝑛)) 𝑙 = ෍ 𝑖=1,2,3 𝛼 𝜃𝑖 − 𝜃𝑖 ∗ + ෍ 𝑗=𝑢,𝑣 𝛽 𝑡𝑗 − 𝑡𝑗 ∗ + 𝛾 𝑡 𝑍 − 𝑡 𝑍 ∗
  • 11. • Object categories: car, airplane, chair, sofa • Datasets: • 3D-R2N2 dataset – rendered ShapeNet objects • PASCAL 3D+ dataset – real images manfully associated with limited CAD models Experiments
  • 12. Experiments • Implementation details: • Network implemented in Tensorflow • Input image size: 128x128x3 • Output voxel grid: 32x32x32 • Running time: • ~18ms for one image (i.e. running at 55 fps) • (Tested with a batch of 24 images on a NVIDIA Tesla M40 GPU)
  • 13. Experiments • Results on the 3D-R2N2 dataset (rendered ShapeNet objects) • Ablation study:
  • 14. Experiments • Results on the 3D-R2N2 dataset (rendered ShapeNet objects) • Ablation study:
  • 15. Experiments • Results on the 3D-R2N2 dataset (rendered ShapeNet objects)
  • 16. • Results on the PASCAL 3D+ dataset (real images) Experiments
  • 17. Summary • A novel 3D reconstruction neural network structure • Embedding Domain knowledge (3D-2D perspective geometry) into a DNN • Performing reconstruction jointly with segmentation and pose estimation • A novel, GPU-friendly Probabilistic Single-view Visual Hull layer