SlideShare a Scribd company logo
2 0 1 9 / 0 9 / 1 2
Survey of 3D Deep Learning (Robo+R3 Study Group)
Arithmer R3 Div. Takashi Nakano
Self-Introduction
• Takashi Nakano
• Graduate School
• Kyoto University
• Laboratory : Nuclear Theory Group
• Research : Theoretical Physics, Ph.D. (Science)
• Phase structure of the universe
• Theoretical properties of Lattice QCD
• Phase structure of graphene
• Former Job
• KOZO KEIKAKU ENGINEERING Inc.
• Contract analysis / Technical support / Introduction support by using software of Fluid Dynamics /
Powder engineering
• Current Job
• Application of machine learning / deep learning to fluid dynamics
• e.g. https://arithmer.co.jp/2019-12-29-1/
• Application of machine learning / deep learning to 3D data
Purpose
• Purpose of this material
• Overview of 3D deep learning
• Comparison b/w each method of 3D deep learning
• Main papers (In this material, I have summarized the material based on
following materials and cited papers therein.)
• E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations",
2018
• M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016
Application
• Application of 3D Deep Learning
Classification Segmentation Correspondence Retrieval
3D data restoration from 2D images,
Pose Estimation, etc.Per-point classification
Each label
at each vertex
same #vertex
at each model
Comparison of Global Feature
[2]
[1]
[1] C. R. Qi, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation", 2016
[2] J. Masci et al. "Geodesic convolutional neural networks on Riemannian manifolds", 2015
[1]
[1]
[1]
Agenda
• Methods of 3D Deep Learning
• Euclidean vs Non-Euclidean
• Euclidean Method
• Projections / Multi-View
• Voxel
• Non-Euclidean Method
• Point Cloud / Mesh / Graph
• Accuracy
• Dataset / Material
• Appendix
• Mesh Generation
• Laplacian on Graph
• Correspondence
3D Data
• 3D Data
Point Cloud Mesh
Point Cloud Mesh Graph
Vertex 〇 〇 〇
Face - 〇 -
Edge - - 〇
[ 𝑥0, 𝑦0, 𝑧0 , … , 𝑥 𝑁, 𝑦 𝑁, 𝑧 𝑁 ]
[ 𝑉00, 𝑉01, 𝑉02 , … , 𝑉𝐹0, 𝑉𝐹1, 𝑉𝐹2 ]
[ 𝑉00, 𝑉01 , 𝑉01, 𝑉02 , … , 𝑉𝐸2, 𝑉𝐸1 ]
𝒱 𝒱
ℱ
𝒱
ℰ
Graph
Representation
• Representation of 3D data
[1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018
[1]
Representation
• Representation of 3D data
Euclidean
Non-Euclidean
Grid
(Translational invariant)
Non-Grid
(not Translational invariant)
Local / Intrinsic
Global / Extrinsic
Point of view from 3D
Point of view from 2D Surface
3D2D
Rigid
Small deformation
Non-Rigid
Large deformation
3D CNN2D CNN
Non-trivial CNN
〇
×
[1]
[1] J. Masci et al. "Geodesic convolutional neural networks on Riemannian manifolds", 2015
[1]
Euclidean vs Non-Euclidean
• Euclidean
[1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018
[1]
Euclidean vs Non-Euclidean
• Euclidean (detail of feature)
Feature Merit Demerit
Descriptors Extraction of 3D topological
feature (SHOT, PFH, etc.)
• Can convert as feature
• Can each problem
The geometric properties
of the shape is lost.
Projections Projection of 3D to 2D - The geometric properties
of the shape is lost.
RGB-D RGB + Depth map • Can use data from
RGB-D sensors
(Kinect/realsence) as
input
• Need depth map.
• Only infer some of the
3D properties based
on the depth.
Volumetric Voxelization • Expansion of 2D CNN • Need Large memories.
• (grid information)
• Need high resolution
for detailed shapes.
(e.g. segmentation)
Multi-View 2D images from multi-angles • Highest accuracy in
Euclidean method
• Need multi-view
images
Euclidean vs Non-Euclidean
• Non-Euclidean
Point Cloud Mesh Graph
Unordered point cloud
No connected information
b/w point cloud
Connected information
b/w point cloud
Graph (Vertex, edge)
Dependence of
noise and density of
point cloud
Need to convert
from point cloud to mesh
Need to create graph type
[ 𝑥0, 𝑦0, 𝑧0 , 𝑥1, 𝑦1, 𝑧1 ]
[ 𝑥1, 𝑦1, 𝑧1 , 𝑥0, 𝑦0, 𝑧0 ]
𝒱 𝒱
ℱ
𝒱
ℰ
[1] R. Hanocka et al., "MeshCNN: A Network with an Edge", 2018
[1]
Euclidean vs Non-Euclidean
• Non-Euclidean (detail of feature)
Feature Merit Demerit
Point
Cloud
• Treat point cloud
• Need to keep translational
and rotational invariance
• Treat unordered point cloud
• No connected information
b/w point cloud
• Original data is often point
cloud.
• e.g. scanned data (No CAD
data, Terrain data)
• Civil engineering, architecture,
medical care, fashion
• Treat noise
• Dependence of density of
point cloud
• Complement b/w point
cloud
• Cannot distinguish b/w
close point cloud
Mesh • Treat mesh data
• Connected information b/w
point cloud
• Convert mesh data to
structure for applying CNN
• CAD data
• e.g. design in manufacturing
• Can keep geometry in few
mesh
• Convert point cloud to mesh
data
Graph • Treat mesh as graph
• Vertex (node)
• Edge (connected information
b/w point cloud)
• Same as Mesh • Create graph type CNN
(non-trivial)
Euclidean vs Non-Euclidean
• Non-Euclidean (detail of feature)
Feature Merit Demerit
Point
Cloud
• Treat point cloud
• Need to keep translational
and rotational invariance
• Treat unordered point cloud
• No connected information
b/w point cloud
• Original data is often point
cloud.
• e.g. scanned data (No CAD
data, Terrain data)
• Civil engineering, architecture,
medical care, fashion
• Treat noise
• Dependence of density of
point cloud
• Complement b/w point
cloud
• Cannot distinguish b/w
close point cloud
Mesh • Treat mesh data
• Connected information b/w
point cloud
• Convert mesh data to
structure for applying CNN
• CAD data
• e.g. design in manufacturing
• Can keep geometry in few
mesh
• Convert point cloud to mesh
data
Graph • Treat mesh as graph
• Vertex (node)
• Edge (connected information
b/w point cloud)
• Same as Mesh • Create graph type CNN
(non-trivial)
Euclidean
• Representation of 3D data
[1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018
[1]
Euclidean
• Each Euclidean Method (Projections / RGB-D / Volumetric / Multi-View)
Method Application Link
Deep Pano Classification Paper
Two-stream CNNs on RGB-D Classification Paper
VoxNet Classification Paper
GitHub(Keras)
MVCNN Classification
Retrieval
Paper
GitHub(PyTorch/TensorFlow
etc.)
Euclidean
• Deep Pano [1]
• Projection to Panoramic image
• Row-wise max-pooling for rotational invariant
Panoramic image
[1] B. Shi et al. "DeepPano: Deep Panoramic Representation for 3D Shape Recognition", 2017
[1]
Euclidean
• Two-stream CNNs on RGB-D [1]
• Concatenate CNN of RGB and CNN of depth map
Concatenation[1]
[1] A. Eitel et al. "Multimodal Deep Learning for Robust RGB-D Object Recognition", 2015
Euclidean
• VoxNet [1]
• Voxelization of 3D point cloud to voxel
• Not robust for data loss
Voxelization
Point Cloud
Voxel
[1] D. Maturana et al. "VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition", 2015
[1]
Euclidean
• MVCNN [1]
• Merge CNN of each images
[1] H. Su et al. "Multi-view Convolutional Neural Networks for 3D Shape Recognition", 2015
[1]
Non-Euclidean (Point Clouds)
• Representation of 3D data
[1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018
[1]
Non-Euclidean (Point Clouds)
• Each Non-Euclidean Method (Point Cloud)
Method Application Link
PointNet Classification
Segmentation
Retrieval
Correspondence
Paper
GitHub (TensorFlow)
PointNet++ Classification
Segmentation
Retrieval
Correspondence
Paper
GitHub (TensorFlow)
PyTorch-geometric (PointConv)
Dynamic Graph CNN
(DGCNN)
Classification
Segmentation
Paper
GitHub (PyTorch/TensorFlow)
PyTorch-geometric
(DynamicEdgeConv)
PointCNN Classification
Segmentation
Paper
GitHub (TensorFlow)
PyTorch-geometric (XConv)
※Some equations from following pages are referred to the documents in PyTorch-geometric.
(https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html)
I will explain PyTorch-geometric in later page.
Non-Euclidean (Point Clouds)
• PointNet [1]
• Treat unordered point cloud by max-pooling
• Comparison b/w PointNet++
• Detailed information is lost
• Cannot treat different density of point cloud
Part segmentation
(Per-point classification)
Predict Affine transformation
(Transrational, Rotational Invariance)
Similar to Spatial Transformer Networks in 2D
Classification
𝑓 𝑥1, ⋯ , 𝑥 𝑛 = 𝑔(ℎ 𝑥1 , ⋯ , ℎ 𝑥 𝑛 )
Max-poolingInput feature
MLP
Symmetry Function
Global + Local Feature
Randomly
rotating the object
along up-axis,
Normalization
in unit square
Affine
transformation
[1]
[1] C. R. Qi, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation", 2016
Non-Euclidean (Point Clouds)
• PointNet
• T-Net [1]
• Similar to Spatial Transformer Networks in 2D
• Spatial Transformer Networks
• Alignment of image (transformation, rotation, distortion etc.) by spatial transformation
• Learn affine transformation from input data (not necessarily special data)
• Can insert this networks at each point b/w networks
Reference Contents
Paper Original Paper
Sample (PyTorch) Dataset : MNIST
[1] M. Jaderberg et al. "Spatial transformer networks",2015
[1]
Non-Euclidean (Point Clouds)
• PointNet
• Spatial Transformer Networks
• Localization net : output parameters 𝜃 to transform for input feature map 𝑈
• Combination of Conv, MaxPool, ReLU, FC
• Output : 2 × 3
• Grid generator : create sampling grid by using the parameters
• Sampler : Output transformed feature map 𝑉
• pixel
Spatial Transformer Networks (2D)
Grid generator
Input map to transformed map
Input
feature map
Output
feature map
𝑥𝑖
𝑠
𝑦𝑖
𝑠 = 𝒯𝜃 𝐺𝑖 = 𝐴 𝜃
𝑥𝑖
𝑡
𝑦𝑖
𝑡
1
2 × 3
[1] M. Jaderberg et al. "Spatial transformer networks",2015
[1][1]
[1]
Non-Euclidean (Point Clouds)
• PointNet
• T-Net
• 3D ver. of Spatial Transformer Networks in 2D
• Not need sampling grid (There are no gird structure in 3D)
• Directly apply transformation to each point cloud
• Output parameter
• 3 × 3 in first T-Net
• 64 × 64 in second T-Net
T-Net
(input feature : 3)
T-Net
(input feature : 64)
[1]
[1] C. R. Qi, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation", 2016
Non-Euclidean (Point Clouds)
• PointNet++ [1]
• Comparison b/w PointNet
• Detailed information is kept
• Can treat different density of point cloud
Concatenation of multi-resolution [1] C. R. Qi et al. "Pointnet++: Deep hierarchical feature learning on point sets in a metric space", 2017
[1]
Non-Euclidean (Point Clouds)
• PointNet++
• Set abstraction
• Grouping in one scale + feature extraction
• Sampling Layer : Extraction of sampling points by farthest point sampling (FPS)
• Grouping Layer : Grouping points around sampling points
• PointNet Layer : Applying PointNet
Sampling Layer Grouping Layer
𝑟
Non-Euclidean (Point Clouds)
• PointNet++
• Point Feature Propagation for segmentation
• Interpolation : interpolation from k neighbor points
• Concatenation
Interpolation
𝑓 𝑗 𝑥 =
𝑖=1
𝑘
𝑤𝑖 𝑥 𝑓𝑖
(𝑗)
𝑖=1
𝑘
𝑤𝑖 𝑥
𝑘 = 3
𝑤𝑖 𝑥 =
1
𝑑 𝑥, 𝑥𝑖
2
𝑑
𝑥
𝑥𝑖
𝑥
feature
weight
Inverse of distance[1] C. R. Qi et al. "Pointnet++: Deep hierarchical feature learning on point sets in a metric space", 2017
[1]
Non-Euclidean (Point Clouds)
• PointNet++
• Single scale grouping
• Multi scale/resolution grouping
• Combination of features from different scales
• Robust for non-uniform sampling density
• Modifying architecture in set abstraction level
𝐿𝑖−1 Level
𝐿𝑖 Level
Original Points
multi-resolution grouping (MRG)
Concatenation of information of multi-resolutionHigh computational cost
multi-scale grouping (MSG)
Recommendation
[1] C. R. Qi et al. "Pointnet++: Deep hierarchical feature learning on point sets in a metric space", 2017
[1] [1]
Non-Euclidean (Point Clouds)
• PointNet++
• Detail of architecture
• Note: #vertex is fixed
𝑆𝐴 512, 0.2, 64, 64, 128 → 𝑆𝐴 128, 0.2, 64, 64, 128 → 𝑆𝐴 256,512,1024
#𝑖𝑛𝑝𝑢𝑡 𝑣𝑒𝑟𝑡𝑒𝑥: 1024
→ 𝐹𝐶 512,0.5 → 𝐹𝐶 256,0.5 → 𝐹𝐶(𝐾)
class
→ 𝐹𝑃 256, 256 → 𝐹𝑃 256,128 → 𝐹𝑃(128,128,128,128, 𝐾)
per point segmentation
1024/2 512/4
Architecture for classification
and part segmentation of ModelNet
using single scale grouping
𝑆𝐴 𝐾, 𝑟, ℓ1, ⋯ , ℓ 𝑑
#vertex radius Pointnet (#FC:d)
𝑆𝐴 ℓ1, ⋯ , ℓ 𝑑
Set abstraction level
Global Set
abstraction level #FC:d
Convert single vector by maxpooling
For classification
For part segmentation
Same in cls. and seg.
𝐹𝐶 ℓ, 𝑑𝑝
𝐹𝑃 ℓ1, ⋯ , ℓ 𝑑
Fully Connected
Feature Propagation
Channel
Ratio of dropout
#FC:d
Non-Euclidean (Point Clouds)
• PointNet++
• Detail of architecture
• Note: #vertex is fixed
#𝑖𝑛𝑝𝑢𝑡 𝑣𝑒𝑟𝑡𝑒𝑥: 1024
Architecture classification of ModelNet using multi-resolution grouping (MRG)
𝑆𝐴 512, 0.2, 64, 64, 128 → 𝑆𝐴 64, 0.4, 128, 128, 256
𝑆𝐴 512, 0.4, 64, 128, 256
𝑆𝐴 64, 128, 256,512
𝑆𝐴 256,512,1024
Concat.
Concat.
→ 𝐹𝐶 512,0.5 → 𝐹𝐶 256,0.5 → 𝐹𝐶(𝐾) Same as single scale grouping
class
Non-Euclidean (Point Clouds)
• Dynamic Graph CNN (DGCNN) [1]
• PointNet + w/ Edge Conv.
• Edge Conv.
• Create local edge structure dynamically (not fixed in each layer)
Edge Conv.
PointNet+ w/ Edge Conv.
𝒙𝑖′ =
𝑗∈𝑁(𝑖)
ℎ 𝚯(𝒙𝑖, 𝒙𝑗 − 𝒙𝑖)
global local
Search neighbors in feature space by kNN
[1] Y. Wang, "Dynamic Graph CNN for Learning on Point Clouds", 2018
[1]
[1]
Non-Euclidean (Point Clouds)
• PointCNN [1]
• Downsampling information from neighborhoods into fewer representative
points
Χ-Conv.
Lower resolution, deeper channels
Decreasing #representative points, deeper channels
𝒙𝑖′ = 𝐶𝑜𝑛𝑣 𝑲, 𝛾Θ 𝑷𝑖 − 𝒑𝑖 × ℎΘ 𝑷𝑖 − 𝒑𝑖 , 𝒙𝑖
Input feature
MLP applied
individually on each point
like PointNet
Kernel
Concatenation
[1] Y. Li et al. "PointCNN: Convolution On X-Transformed Points", 2018
[1]
[1]
Non-Euclidean (Mesh)
• Representation of 3D data
[1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018
[1]
Non-Euclidean (Mesh)
• Each Non-Euclidean Method (Mesh)
Method Application Link
MeshCNN Classification
Segmentation
Paper
GitHub (PyTorch)
MeshNet Classification Paper
GitHub (PyTorch)
Non-Euclidean (Mesh)
• MeshCNN [1]
• Edge collapse by pooling
• Can apply only the manifold mesh
use in segmentation
EdgeEdge collapse by pooling
Input feature
Angle
Length
Pooling / Unpooling
[1] R. Hanocka et al., "MeshCNN: A Network with an Edge", 2018
[1]
[1] [1]
Non-Euclidean (Mesh)
• MeshNet
• Input feature
• Center, corner, normal, neighbor index
Information of neighborhood of face
Mesh Conv.
(Combination + Aggregation)
[1] Y. Feng et al. "MeshNet: Mesh Neural Network for 3D Shape Representation", 2018
[1]
[1]
[1]
Non-Euclidean (Graph)
• Representation of 3D data
[1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018
[1]
Non-Euclidean (Graph)
• Each Non-Euclidean Method (Graph)
• Spectral / Spatial Method
Spectral Spatial
Euclidean
(1D)
Non-Euclidean
(Manifold)
∆𝜙𝑖 = 𝜆𝑖 𝜙𝑖
𝜙𝑖 = 𝑒 𝑖𝜔𝑥
, 𝜆𝑖 = 𝜔2
Local coordinate
𝐷𝑗 𝑥 𝑓 =
𝑦∈𝑁(𝑥)
𝜔𝑗(𝒖 𝑥, 𝑦 )𝑓(𝑦)
𝑓 ∗ 𝑔 𝑥 =
𝑗=1
𝐽
𝑔𝑗 𝐷𝑗 𝑥 𝑓
Patch Operator
Pseudo-coordinate
Convolution
Generalization of
Fourier basis
[1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016
[2] J. Masci et al. "Geodesic convolutional neural networks on Riemannian manifolds", 2015
[3] M. Fey et al. "SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels", 2017
[1] [1]
[3]
[2]
Non-Euclidean (Graph)
• Each Non-Euclidean Method (Graph)
• Spatial method is more useful than spectral method.
Method Structure Feature
Spectral • Fourier basis in manifold
• Laplacian eigenvalue/eigenvector
• Spectral filter coefficients is base dependent in some
method
• No locality in some method
• High computational cost
Spatial • Create local coordinate
• Patch operator + Conv.
• Locality
• Efficient computational cost
※Some equations from following pages are referred to the documents in PyTorch-geometric.
(https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html)
I will explain PyTorch-geometric in later page.
Non-Euclidean (Graph)
• Each Non-Euclidean Method (Graph)
• Spectral, Spectral free
Method Method Application Link
Spectral CNN Spectral Graph Paper
Chebyshev Spectral
CNN
(ChebNet)
Spectral free Graph Paper
GitHub (TensorFlow)
PyTorch-geometric
(ChebConv)
Graph Convolutional
Network
(GCN)
Spectral free Graph Paper
PyTorch-geometric
(GCNConv)
Graph Neural Network
(GNN)
Spectral free Graph Paper
Non-Euclidean (Graph)
• Spectral CNN [2]
• cannot use different shape
• Spectral filter coefficients is base dependent
• High computational cost
• No locality
∆𝑓 𝑖 ∝
𝑖,𝑗 ∈ℰ
𝜔𝑖𝑗(𝑓𝑖 − 𝑓𝑗)Laplacian
Different shape
-> different basis -> different result
𝒇ℓ
𝑜𝑢𝑡
= 𝜉
ℓ′=1
𝑝
𝚽 𝑘 𝑮ℓ,ℓ′ 𝚽 𝑘
𝑇
𝒇ℓ′
𝑖𝑛
Laplacian eigenvectorReLU
[1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016
[2] J. Bruna et al. "Spectral Networks and Locally Connected Networks on Graphs", 2013
[1]
[1]
Non-Euclidean (Graph)
• Chebyshev Spectral CNN (ChebNet) [1]
• Not calculate Laplacian eigenvectors directly
• Locality (K hops)
• Approximate filter as polynomial
• Graph Convolutional Network (GCN) [2]
• Special ver. of ChebNet (𝐾 = 2)
𝑋′
=
𝑘=0
𝐾−1
𝑍(𝑘)
⋅ Θ(𝑘)
𝑍(0) = 𝑋
𝑍(1)
= 𝐿 ⋅ 𝑋
𝑍(𝑘) = 2 ⋅ 𝐿 ⋅ 𝑍 𝑘−1 − 𝑍(𝑘−2)
𝐿:scaled and normalized Laplacian
[1] M. Defferrard et al. "Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering", 2016
[2] T. N. Kipf et al. "Semi-Supervised Classification with Graph Convolutional Networks", 2016
Non-Euclidean (Graph)
• Each Non-Euclidean Method (Graph)
• Charting
Application Link
Geodesic CNN Mesh
Shape retrieval /
correspondence
Paper
Anisotropic CNN Mesh / point cloud
Shape correspondence
Paper
MoNet Graph / mesh / point cloud
Shape correspondence
Paper
PyTorch-geometric (GMMConv)
SplineCNN Graph / Mesh
Classification
Shape correspondence
Paper
GitHub (PyTorch)
PyTorch-geometric
(SplineConv)
FeaStNet Graph / Mesh
Shape correspondence
Segmentation
Paper
PyTorch-geometric (FeaStConv)
Non-Euclidean (Graph)
• Geodesic CNN (GCNN)[1] ⊂ Anisotropic CNN (ACNN)[2] ⊂ MoNet [3]
MoNet
𝐷𝑗 𝑥 𝑓 =
𝑦∈𝑁(𝑥)
𝜔𝑗(𝒖 𝑥, 𝑦 )𝑓(𝑦)
𝑓 ∗ 𝑔 𝑥 =
𝑗=1
𝐽
𝑔𝑗 𝐷𝑗 𝑥 𝑓
Patch Operator
Pseudo-coordinate
Convolution
𝜔𝑗 𝒖 = exp −
1
2
𝒖 − 𝜇 𝑗
T
Σj
−1
(𝒖 − 𝜇 𝑗)
𝜔𝑗 𝒖 = exp −
1
2
𝒖 𝑇
𝑹 𝜃 𝑗
𝛼 0
0 1
𝑹 𝜃 𝑗
𝑇
𝒖ACNN
GCNN 𝜔𝑗 𝒖 = exp −
1
2
𝒖 − 𝑢𝑗
T 𝜎𝜌
2
0
0 𝜎 𝜃
2
(𝒖 − 𝑢𝑗)
Rotation of 𝜃 to the maximum
curvature direction The degree of anisotropy
covariance (radius, angle direction)
Learning parameters
[1] J. Masci et al. "Geodesic convolutional neural networks on Riemannian manifolds", 2015
[2] D Boscaini et al. "Learning shape correspondence with anisotropic convolutional neural networks", 2016
[3] F. Monti et al. "Geometric deep learning on graphs and manifolds using mixture model CNNs", 2016
[4] M. Fey et al. "SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels", 2017
[1]
[4]
Non-Euclidean (Graph)
• Geodesic CNN (GCNN)
• Create local coordinate
• Do not verify the meaningful chart (need to create small radius chart)
• Anisotropic CNN (ACNN)
• Fourier basis is based on anisotropic heat diffusion eq.
• MoNet
• Learn filter as parametric kernel
• Generalization of geodesic CNN and anisotropic CNN
Non-Euclidean (Graph)
• SplineCNN [1]
• Filter based on B-spline function
• Efficient computational cost
𝒙𝑖′ =
1
|𝑁 𝑖 |
𝑗∈𝑁(𝑖)
𝒙𝑖 ⋅ ℎ 𝚯(𝒆𝑖,𝑗)
Weighted B-Spline basis
[1] M. Fey et al. "SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels", 2017
[1]
Non-Euclidean (Graph)
• FeaStNet [1]
• Dynamically determine relation b/w filter weight and local graph of a node
𝒙𝑖′ =
1
|𝑁 𝑖 |
𝑗∈𝑁(𝑖) 𝑚=1
𝑀
𝑞 𝑚 𝒙𝑖, 𝒙𝑗 𝑾 𝑚 𝒙𝑗
Filter
(e.g. 𝑀 = 3 × 3 = 9)
Euclidean FeaStNet
Weight
Input Output Input Output
#neighbor
(e.g. 𝑁 = 6)
𝑞 𝑚 𝒙𝑖, 𝒙𝑗 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥𝑗 𝒖 𝒎
𝑻
𝒙𝒊 − 𝒙𝒋 + 𝑐 𝑚
𝒙𝑖′ =
𝑚=1
𝑀
𝑾 𝑚 𝒙 𝑛(𝑚,𝑖)
pixel
D input featureE output feature [1] N Verma et al. "FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis", 2017
[1][1]
Non-Euclidean (Graph)
• PyTorch-geometric
• https://github.com/rusty1s/pytorch_geometric
• Library based on PyTorch
• For point cloud, mesh (not only graph)
• Include Point cloud, graph-type approach code
• PointNet++, DGCNN, PointCNN
• ChebNet, GCN, MoNet, SplineCNN, FeaStNet
• Easy to get the famous sample data and transform same data format
• ModelNet, ShapeNet, etc.
• Many example and benchmark
Accuracy
• Accuracy (Classification)
• around 90% in any method (except VoxNet)
Method “ModelNet40”
Overall Acc. [%] / Mean
class Acc. [%]
“SHREC”
Overall Acc. [%]
VoxNet 85.9 / 83.0 −
MVCNN − / 90.1 𝟗𝟔. 𝟎𝟗
PointNet 89.2 / 86.0 −
PointNet++ 90.7 / − −
DGCNN 𝟗𝟐. 𝟗 / 𝟗𝟎. 𝟐 −
PointCNN 92.2 / 88.1 −
MeshNet − / 91.9 −
MeshCNN − / − 91.0
※Please refer the detail in each paper (mentioned in each page)
Accuracy
• Accuracy (Segmentation)
Method Part segmentation
“ShapeNet”
mIoU (mean per-class
part-averaged IoU) [%]
Part
segmentation
“ScanNet”
Acc. [%]
Part
segmentation
“COSEG”
Acc. [%]
Scene
segmentation
“S3DIS”
Acc. [%] /
mIoU [%]
Human body
segmentation
“including
SCAPE, FAUST
etc.”
Acc. [%]
PointNet 80.4 (57.9 − 95.3) 73.9 54.4 − 91.5 78.6 / − 90.77
PointNet++ 81.9 (58.7 − 95.3) 84.5 79.1 − 98.9 − / − −
DGCNN 82.3 (𝟔𝟑. 𝟓 − 𝟗𝟓. 𝟕) − − 𝟖𝟒. 𝟏 / 56.1 −
PointCNN 𝟖𝟒. 𝟔 𝟖𝟓. 𝟏 − − / 𝟔𝟓. 𝟑𝟗 −
MeshCNN − − 𝟗𝟕. 𝟓𝟔 − 𝟗𝟗. 𝟔𝟑 − / − 𝟗𝟐. 𝟑𝟎
FeaStNet 81.5 − − − / − −
※Please refer the detail in each paper (mentioned in each page)
Dataset
• 3D Dataset
Contents Data Format Purpose PyTorch-geometric
ModelNet10/40 3D CAD Model
(10 or 40 classes)
Mesh (.OFF) Classification ModelNet
ShapeNet 3D Shape Point Cloud (.pts) Segmentation ShapeNet
ScanNet Indoor Scan Data Mesh (.ply) Segmentation -
S3DIS
(original, .h5)
Indoor Scan Data Point Cloud Segmentation S3DIS
ScanNet:registration required
S3DIS : registration required (for original)
Dataset
• 3D Dataset
Contents Data Format Purpose PyTorch-geometric
SHREC many type for each
contest
- Retrieval -
SHREC2016 Animal, Human
(Part Data)
Mesh (.OFF) Correspondence SHREC2016
TOSCA Animal, Human Mesh
(same #vertices at
each category,
separate file of
vertices and
triangles)
Correspondence TOSCA
PCPNet 3D Shape Point Cloud (.xyz)
(Including normal,
curvature files.)
Estimation of local
shape (Normal,
curvature)
PCPNet
FAUST Human body Mesh Correspondence FAUST
FAUST(Note) : registration required
Material
• Material of 3D deep learning (3D / point cloud)
Paper Comment
A survey on Deep Learning Advances on
Different 3D Data Representations
• Review of 3D Deep Learning
• Easier to read it
• Written from point of view about Euclidean
and Non-Euclidean method
Paperwithcode • Paper w/ code about 3D
Point Cloud Deep Learning Survey Ver. 2 • Deep learning for point cloud
• Survey of many papers
Material
• Material of 3D deep learning (graph)
Paper Comment
Geometric deep learning: going beyond
Euclidean data
• Review of geometric deep learning
Geometric Deep Learning • summary of paper and code about geometric
deep learning
Geometric Deep Learning on Graphs and
Manifolds (NIPS2017)
• Presentation (youtube) about geometric deep
learning
Summary
• There are many methods of 3D deep learning.
• Two main method
• Euclidean vs Non-Euclidean
• Euclidean Method
• Projections / Multi-View / Voxel
• Non-Euclidean Method
• Point Cloud / Mesh / Graph
• Each method have merit and demerit.
• We need to choose the better method for each data type and application.
• The research about 3D deep learning is growing.
Appendix
• Appendix
• Mesh Generation
• Laplacian on Graph
• Correspondence
Appendix : Mesh Generation
• Mesh Generation
• In this material, I have summarized these materials.
Link Contents
点群面張り(精密工学会) • Surface reconstruction
メッシュ処理(精密工学会) • Mesh processing
CV勉強会@関東発表資料 点群再構成に関するサーベイ • Survey of point cloud reconstruction
Appendix : Mesh Generation
• Difficulty of Mesh Generation
Processing Difficulty
Pre-processing Reduction of Noise / Missing / Abnormal value / density difference of vertices
Post-processing Mesh smoothing / hole filling
Ground Truth Noise
/ Abnormal value
Missing / density
difference of vertices
Mesh smoothing
/ hole filling
Appendix : Mesh Generation
• Kinds of Mesh Generation
Kind Feature Classification of the method
Direct Triangulation Direct mesh generation form point cloud Explicit method
Surface Smoothness Smooth surface mesh from point cloud Implicit method
Direct Triangulation Surface Smoothness
Appendix : Mesh Generation
• Classification of the method
• In general, it is easier to use the implicit method, since there are noise of point
cloud.
Classification of the method Information to use Influence of noise and density of
vertices
Guarantee of accuracy
Explicit method Vertices Large
(error of vertices = error of
meshes)
◎
Implicit method Meshes based on isosurface of
function fields which is
calculated from vertices
Small 〇
Appendix : Mesh Generation
• Kinds of Mesh Generation (Detail)
• Direct Triangulation (example of built-in function in MeshLab)
Method Feature
Voronoi-Based Surface Reconstruction Creation of Delaunay diagram adding the vertices
using Voronoi diagram
Ball-Pivoting Algorithm Roll the ball over the point cloud and generate mesh
from the point cloud located within a certain distance
Appendix : Mesh Generation
• Voronoi-Based Surface Reconstruction
• Voronoi diagram
• Region divided by the bisector of each vertices (in 2D)
• Delaunay triangulation
• Triangulation by connection of vertices
bisector
Vertices (S)
Voronoi
Vertices (V) Delaunay triangulation of S and V
(black + red line)
Example of 2D
Surface
(black line)
[1] N. Amenta et al. "A New Voronoi-Based Surface Reconstruction Algorithm", 1998
[1] [1]
Appendix : Mesh Generation
• Ball-Pivoting Algorithm
Not created
Close point cloudSparse data
Not created
Ideal data
[1] F Bernardini et al. "The Ball-Pivoting Algorithm for Surface Reconstruction", 1999
[1]
Appendix : Mesh Generation
• Kinds of Mesh Generation (Detail)
• Surface Smoothness (example of built-in function in MeshLab)
Method Feature
Signed distance function
+ Marching Cubes
Creation of Signed distance function by using the
distance b/w vertices and surface
+ Mesh generation by using Marching Cubes
Screened Poisson surface reconstruction
(Poisson surface reconstruction)
Distinguish b/w inside and outside of surface by
using Poisson eq.
Appendix : Mesh Generation
• Signed distance function + Marching Cubes
Oriented tangent planes Estimated signed distance
Output of
modified marching cubes
𝑓 𝒑 = 𝒑 − 𝒐 ⋅ 𝒏 𝑓 𝒑 > 0
→ 𝑜𝑢𝑡𝑠𝑖𝑑𝑒
𝑓 𝒑 < 0
→ 𝑖𝑛𝑠𝑖𝑑𝑒
𝒑
𝒑
𝒐
𝒏
𝑓 𝒑 = 0 → 𝑠𝑢𝑟𝑓𝑎𝑐𝑒
[1] H. Hoppe et al. "Surface Reconstruction from Unorganized Points", 1992
[1]
Appendix : Mesh Generation
• Screened Poisson surface reconstruction
• get Indicator Function by solving the Poisson eq.
∆𝜒 ≡ ∇ ⋅ ∇𝜒 = ∇ ⋅ 𝑽
Poisson eq.
Poisson surface reconstruction
Screened Poisson
surface reconstruction
Complement
b/w point cloud
[1] M. Kazhdan et al. "Poisson Surface Reconstruction", 2006
[2]
[1]
[2] M. Kazhdan et al. "Screened Poisson Surface Reconstruction", 2013
Appendix : Laplacian on Graph
• Laplacian on Graph [1]
∆𝑓 𝑖 =
1
𝑎𝑖
𝑖,𝑗 ∈ℰ
𝜔𝑖𝑗(𝑓𝑖 − 𝑓𝑗)Laplacian
(𝒱, ℰ)Graph (undirected)
𝒱 = {1, ⋯ , 𝑛}
ℰ ⊆ 𝒱 × 𝒱
𝑎𝑖
𝜔𝑖𝑗
weight
div. 𝑑𝑖𝑣 𝐹 𝑖 =
1
𝑎𝑖
𝑗: 𝑖,𝑗 ∈ℰ
𝜔𝑖𝑗 𝐹𝑖𝑗
Grad. ∇𝑓 𝑖𝑗 = 𝑓𝑖 − 𝑓𝑗
Mesh
𝜔𝑖𝑗 =
−ℓ𝑖𝑗
2
+ ℓ𝑗𝑘
2
+ ℓ 𝑘𝑖
2
8𝑎𝑖𝑗𝑘
+
−ℓ𝑖𝑗
2
+ ℓ𝑗ℎ
2
+ ℓℎ𝑖
2
8𝑎𝑖𝑗ℎ
=
1
2
(cot 𝛼𝑖𝑗 + cot 𝛽𝑖𝑗)
𝑎𝑖 =
1
3
𝑗𝑘: 𝑖,𝑗,𝑘 ∈ℱ
𝑎𝑖𝑗𝑘
𝑎𝑖𝑗𝑘 = 𝑠𝑖𝑗𝑘(𝑠𝑖𝑗𝑘 − ℓ𝑖𝑗)(𝑠𝑖𝑗𝑘 − ℓ𝑗𝑘)(𝑠𝑖𝑗𝑘 − ℓ 𝑘𝑖)
1/2
𝑠𝑖𝑗𝑘 =
1
2
(𝑎𝑖𝑗 + 𝑎𝑗𝑘 + 𝑎 𝑘𝑖)
𝑓: 𝒱 → ℝ, 𝐹: ℰ → ℝ
∆≡ −𝑑𝑖𝑣 ∇
→Laplacian
eigenvalues 𝜆 > 0
[1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016
[1]
Appendix : Laplacian on Graph
• Laplacian on Graph [1]
Δ𝒇 = 𝑨−1 𝑫 − 𝑾 𝒇
𝒇 = 𝑓1, ⋯ , 𝑓𝑛
𝑇
𝑾 = (𝜔𝑖𝑗)
𝑨 = 𝑑𝑖𝑎𝑔(𝑎1, ⋯ , 𝑎 𝑛)
𝑫 = 𝑑𝑖𝑎𝑔
𝑗:𝑗≠𝑖
𝜔𝑖𝑗
Laplacian ∆ Condition
Unnormalized graph
Laplacian
∆= 𝑫 − 𝑾 𝐴 = 𝐼
Normalized
Symmetry Laplacian
∆= 𝑰 − 𝑫−
𝟏
𝟐 𝑾𝑫
𝟏
𝟐
𝐴 = 𝐷
+ Normalization
Random walk
Laplacian
∆= 𝑰 − 𝑫−1 𝑾 𝐴 = 𝐷
Laplacian (as matrix)
[1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016
Appendix : Laplacian on Graph
• Laplacian on Graph [1]
• Convolution
(𝑓 ∗ 𝑔)(𝑥) =
𝑖≥0
𝑓𝑖 𝑔𝑖 𝜙𝑖(𝑥)
𝒇 ∗ 𝒈 = 𝚽𝑑𝑖𝑎𝑔 𝑔 𝚽T 𝐟
𝒇 = 𝑓1, ⋯ , 𝑓𝑛
𝑇
𝒈 = ( 𝑔1, ⋯ , 𝑔 𝒏)
𝚽 = (𝜙1, ⋯ , 𝜙 𝑛)
Matrix
Conv.
[1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016
Appendix :Correspondence
• Correspondence [1]
Query Reference input
𝑥𝑖 (𝑦1, ⋯ , 𝑦 𝑁)
label
Each query vertex has labels
as all reference vertices
Output
(Probability)
Correct Label
Output
(Probability)
(𝑝1, 𝑝2, ⋯ , 𝑝 𝑁)
(1,0, ⋯ , 0)
One-hot vector
Loss
[1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016
[1]
Summary of survey papers on deep learning method to 3D data

More Related Content

PDF
SGD+α: 確率的勾配降下法の現在と未来
PDF
論文紹介 Pixel Recurrent Neural Networks
PDF
データ解析7 主成分分析の基礎
PDF
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
PDF
ゼロから始める転移学習
PPTX
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Est...
PPTX
[DL輪読会]Deep High-Resolution Representation Learning for Human Pose Estimation
PDF
文献紹介:Rethinking Data Augmentation for Image Super-resolution: A Comprehensive...
SGD+α: 確率的勾配降下法の現在と未来
論文紹介 Pixel Recurrent Neural Networks
データ解析7 主成分分析の基礎
SSII2022 [SS2] 少ないデータやラベルを効率的に活用する機械学習技術 〜 足りない情報をどのように補うか?〜
ゼロから始める転移学習
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Est...
[DL輪読会]Deep High-Resolution Representation Learning for Human Pose Estimation
文献紹介:Rethinking Data Augmentation for Image Super-resolution: A Comprehensive...

What's hot (20)

PDF
【DL輪読会】Egocentric Video Task Translation (CVPR 2023 Highlight)
PDF
第52回SWO研究会チュートリアル資料
PPTX
AWSでGPUも安く大量に使い倒せ
PPTX
SimGAN 輪講資料
PDF
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
PPTX
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
PPTX
[DL輪読会]Pyramid Stereo Matching Network
PDF
30th コンピュータビジョン勉強会@関東 DynamicFusion
PDF
[DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
PDF
ICML 2021 Workshop 深層学習の不確実性について
PDF
Stockfish NNUEプロジェクト - 第25回ゲームプログラミングワークショップ (GPW-20)
PPTX
The review of 'Explaining nonlinear classification decisions with deep Taylor...
PDF
ICCV19読み会 "Learning Single Camera Depth Estimation using Dual-Pixels"
PPTX
[DL輪読会]Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial...
PPTX
[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP
PPTX
【DL輪読会】Reflash Dropout in Image Super-Resolution
PDF
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
PDF
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
PPTX
3次元計測とフィルタリング
PPTX
Graph Neural Networks
【DL輪読会】Egocentric Video Task Translation (CVPR 2023 Highlight)
第52回SWO研究会チュートリアル資料
AWSでGPUも安く大量に使い倒せ
SimGAN 輪講資料
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
[DL輪読会]Pyramid Stereo Matching Network
30th コンピュータビジョン勉強会@関東 DynamicFusion
[DL輪読会] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
ICML 2021 Workshop 深層学習の不確実性について
Stockfish NNUEプロジェクト - 第25回ゲームプログラミングワークショップ (GPW-20)
The review of 'Explaining nonlinear classification decisions with deep Taylor...
ICCV19読み会 "Learning Single Camera Depth Estimation using Dual-Pixels"
[DL輪読会]Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial...
[DL輪読会]Wav2CLIP: Learning Robust Audio Representations From CLIP
【DL輪読会】Reflash Dropout in Image Super-Resolution
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
3次元計測とフィルタリング
Graph Neural Networks
Ad

Similar to Summary of survey papers on deep learning method to 3D data (20)

PDF
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
PPTX
Ivan Sahumbaiev "Deep Learning approaches meet 3D data"
PDF
PDF
Deep learning for 3 d point clouds presentation
PDF
lecture_16_jiajun.pdf
PPTX
Self-supervised representation learning on point clouds - Copy.pptx
PDF
Point cloud labeling using machine learning
PDF
Learning to Perceive the 3D World
PDF
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
PDF
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
PDF
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
PPTX
[NS][Lab_Seminar_240611]Graph R-CNN.pptx
PDF
iCAMPResearchPaper_ObjectRecognition (2)
PDF
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
PDF
VoxelNet
PDF
Learning Graph Representation for Data-Efficiency RL
PDF
Deep learning for 3-D Scene Reconstruction and Modeling
PPTX
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
PDF
Laplacian-regularized Graph Bandits
PDF
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Ivan Sahumbaiev "Deep Learning approaches meet 3D data"
Deep learning for 3 d point clouds presentation
lecture_16_jiajun.pdf
Self-supervised representation learning on point clouds - Copy.pptx
Point cloud labeling using machine learning
Learning to Perceive the 3D World
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud
[NS][Lab_Seminar_240611]Graph R-CNN.pptx
iCAMPResearchPaper_ObjectRecognition (2)
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
VoxelNet
Learning Graph Representation for Data-Efficiency RL
Deep learning for 3-D Scene Reconstruction and Modeling
Oleksandr Obiednikov “Affine transforms and how CNN lives with them”
Laplacian-regularized Graph Bandits
論文紹介:PointNet: Deep Learning on Point Sets for 3D Classification and Segmenta...
Ad

More from Arithmer Inc. (20)

PDF
コーディネートレコメンド
PDF
Test for AI model
PDF
最適化
PDF
Arithmerソリューション紹介 流体予測システム
PDF
Weakly supervised semantic segmentation of 3D point cloud
PPTX
Arithmer NLP 自然言語処理 ソリューション紹介
PPTX
Arithmer Robo Introduction
PDF
Arithmer AIチャットボット
PDF
Arithmer R3 Introduction
PPTX
VIBE: Video Inference for Human Body Pose and Shape Estimation
PPTX
Arithmer Inspection Introduction
PDF
全力解説!Transformer
PPTX
Arithmer NLP Introduction
PPTX
Introduction of Quantum Annealing and D-Wave Machines
PDF
Arithmer OCR Introduction
PDF
Arithmer Dynamics Introduction
PDF
ArithmerDB Introduction
PPTX
Summarizing videos with Attention
PDF
3D human body modeling from RGB images
PDF
コーディネートレコメンド
Test for AI model
最適化
Arithmerソリューション紹介 流体予測システム
Weakly supervised semantic segmentation of 3D point cloud
Arithmer NLP 自然言語処理 ソリューション紹介
Arithmer Robo Introduction
Arithmer AIチャットボット
Arithmer R3 Introduction
VIBE: Video Inference for Human Body Pose and Shape Estimation
Arithmer Inspection Introduction
全力解説!Transformer
Arithmer NLP Introduction
Introduction of Quantum Annealing and D-Wave Machines
Arithmer OCR Introduction
Arithmer Dynamics Introduction
ArithmerDB Introduction
Summarizing videos with Attention
3D human body modeling from RGB images

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Machine learning based COVID-19 study performance prediction
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
cuic standard and advanced reporting.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Electronic commerce courselecture one. Pdf
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
Review of recent advances in non-invasive hemoglobin estimation
sap open course for s4hana steps from ECC to s4
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Machine learning based COVID-19 study performance prediction
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Weekly Chronicles - August'25 Week I
cuic standard and advanced reporting.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Digital-Transformation-Roadmap-for-Companies.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Electronic commerce courselecture one. Pdf

Summary of survey papers on deep learning method to 3D data

  • 1. 2 0 1 9 / 0 9 / 1 2 Survey of 3D Deep Learning (Robo+R3 Study Group) Arithmer R3 Div. Takashi Nakano
  • 2. Self-Introduction • Takashi Nakano • Graduate School • Kyoto University • Laboratory : Nuclear Theory Group • Research : Theoretical Physics, Ph.D. (Science) • Phase structure of the universe • Theoretical properties of Lattice QCD • Phase structure of graphene • Former Job • KOZO KEIKAKU ENGINEERING Inc. • Contract analysis / Technical support / Introduction support by using software of Fluid Dynamics / Powder engineering • Current Job • Application of machine learning / deep learning to fluid dynamics • e.g. https://arithmer.co.jp/2019-12-29-1/ • Application of machine learning / deep learning to 3D data
  • 3. Purpose • Purpose of this material • Overview of 3D deep learning • Comparison b/w each method of 3D deep learning • Main papers (In this material, I have summarized the material based on following materials and cited papers therein.) • E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018 • M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016
  • 4. Application • Application of 3D Deep Learning Classification Segmentation Correspondence Retrieval 3D data restoration from 2D images, Pose Estimation, etc.Per-point classification Each label at each vertex same #vertex at each model Comparison of Global Feature [2] [1] [1] C. R. Qi, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation", 2016 [2] J. Masci et al. "Geodesic convolutional neural networks on Riemannian manifolds", 2015 [1] [1] [1]
  • 5. Agenda • Methods of 3D Deep Learning • Euclidean vs Non-Euclidean • Euclidean Method • Projections / Multi-View • Voxel • Non-Euclidean Method • Point Cloud / Mesh / Graph • Accuracy • Dataset / Material • Appendix • Mesh Generation • Laplacian on Graph • Correspondence
  • 6. 3D Data • 3D Data Point Cloud Mesh Point Cloud Mesh Graph Vertex 〇 〇 〇 Face - 〇 - Edge - - 〇 [ 𝑥0, 𝑦0, 𝑧0 , … , 𝑥 𝑁, 𝑦 𝑁, 𝑧 𝑁 ] [ 𝑉00, 𝑉01, 𝑉02 , … , 𝑉𝐹0, 𝑉𝐹1, 𝑉𝐹2 ] [ 𝑉00, 𝑉01 , 𝑉01, 𝑉02 , … , 𝑉𝐸2, 𝑉𝐸1 ] 𝒱 𝒱 ℱ 𝒱 ℰ Graph
  • 7. Representation • Representation of 3D data [1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018 [1]
  • 8. Representation • Representation of 3D data Euclidean Non-Euclidean Grid (Translational invariant) Non-Grid (not Translational invariant) Local / Intrinsic Global / Extrinsic Point of view from 3D Point of view from 2D Surface 3D2D Rigid Small deformation Non-Rigid Large deformation 3D CNN2D CNN Non-trivial CNN 〇 × [1] [1] J. Masci et al. "Geodesic convolutional neural networks on Riemannian manifolds", 2015 [1]
  • 9. Euclidean vs Non-Euclidean • Euclidean [1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018 [1]
  • 10. Euclidean vs Non-Euclidean • Euclidean (detail of feature) Feature Merit Demerit Descriptors Extraction of 3D topological feature (SHOT, PFH, etc.) • Can convert as feature • Can each problem The geometric properties of the shape is lost. Projections Projection of 3D to 2D - The geometric properties of the shape is lost. RGB-D RGB + Depth map • Can use data from RGB-D sensors (Kinect/realsence) as input • Need depth map. • Only infer some of the 3D properties based on the depth. Volumetric Voxelization • Expansion of 2D CNN • Need Large memories. • (grid information) • Need high resolution for detailed shapes. (e.g. segmentation) Multi-View 2D images from multi-angles • Highest accuracy in Euclidean method • Need multi-view images
  • 11. Euclidean vs Non-Euclidean • Non-Euclidean Point Cloud Mesh Graph Unordered point cloud No connected information b/w point cloud Connected information b/w point cloud Graph (Vertex, edge) Dependence of noise and density of point cloud Need to convert from point cloud to mesh Need to create graph type [ 𝑥0, 𝑦0, 𝑧0 , 𝑥1, 𝑦1, 𝑧1 ] [ 𝑥1, 𝑦1, 𝑧1 , 𝑥0, 𝑦0, 𝑧0 ] 𝒱 𝒱 ℱ 𝒱 ℰ [1] R. Hanocka et al., "MeshCNN: A Network with an Edge", 2018 [1]
  • 12. Euclidean vs Non-Euclidean • Non-Euclidean (detail of feature) Feature Merit Demerit Point Cloud • Treat point cloud • Need to keep translational and rotational invariance • Treat unordered point cloud • No connected information b/w point cloud • Original data is often point cloud. • e.g. scanned data (No CAD data, Terrain data) • Civil engineering, architecture, medical care, fashion • Treat noise • Dependence of density of point cloud • Complement b/w point cloud • Cannot distinguish b/w close point cloud Mesh • Treat mesh data • Connected information b/w point cloud • Convert mesh data to structure for applying CNN • CAD data • e.g. design in manufacturing • Can keep geometry in few mesh • Convert point cloud to mesh data Graph • Treat mesh as graph • Vertex (node) • Edge (connected information b/w point cloud) • Same as Mesh • Create graph type CNN (non-trivial)
  • 13. Euclidean vs Non-Euclidean • Non-Euclidean (detail of feature) Feature Merit Demerit Point Cloud • Treat point cloud • Need to keep translational and rotational invariance • Treat unordered point cloud • No connected information b/w point cloud • Original data is often point cloud. • e.g. scanned data (No CAD data, Terrain data) • Civil engineering, architecture, medical care, fashion • Treat noise • Dependence of density of point cloud • Complement b/w point cloud • Cannot distinguish b/w close point cloud Mesh • Treat mesh data • Connected information b/w point cloud • Convert mesh data to structure for applying CNN • CAD data • e.g. design in manufacturing • Can keep geometry in few mesh • Convert point cloud to mesh data Graph • Treat mesh as graph • Vertex (node) • Edge (connected information b/w point cloud) • Same as Mesh • Create graph type CNN (non-trivial)
  • 14. Euclidean • Representation of 3D data [1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018 [1]
  • 15. Euclidean • Each Euclidean Method (Projections / RGB-D / Volumetric / Multi-View) Method Application Link Deep Pano Classification Paper Two-stream CNNs on RGB-D Classification Paper VoxNet Classification Paper GitHub(Keras) MVCNN Classification Retrieval Paper GitHub(PyTorch/TensorFlow etc.)
  • 16. Euclidean • Deep Pano [1] • Projection to Panoramic image • Row-wise max-pooling for rotational invariant Panoramic image [1] B. Shi et al. "DeepPano: Deep Panoramic Representation for 3D Shape Recognition", 2017 [1]
  • 17. Euclidean • Two-stream CNNs on RGB-D [1] • Concatenate CNN of RGB and CNN of depth map Concatenation[1] [1] A. Eitel et al. "Multimodal Deep Learning for Robust RGB-D Object Recognition", 2015
  • 18. Euclidean • VoxNet [1] • Voxelization of 3D point cloud to voxel • Not robust for data loss Voxelization Point Cloud Voxel [1] D. Maturana et al. "VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition", 2015 [1]
  • 19. Euclidean • MVCNN [1] • Merge CNN of each images [1] H. Su et al. "Multi-view Convolutional Neural Networks for 3D Shape Recognition", 2015 [1]
  • 20. Non-Euclidean (Point Clouds) • Representation of 3D data [1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018 [1]
  • 21. Non-Euclidean (Point Clouds) • Each Non-Euclidean Method (Point Cloud) Method Application Link PointNet Classification Segmentation Retrieval Correspondence Paper GitHub (TensorFlow) PointNet++ Classification Segmentation Retrieval Correspondence Paper GitHub (TensorFlow) PyTorch-geometric (PointConv) Dynamic Graph CNN (DGCNN) Classification Segmentation Paper GitHub (PyTorch/TensorFlow) PyTorch-geometric (DynamicEdgeConv) PointCNN Classification Segmentation Paper GitHub (TensorFlow) PyTorch-geometric (XConv) ※Some equations from following pages are referred to the documents in PyTorch-geometric. (https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html) I will explain PyTorch-geometric in later page.
  • 22. Non-Euclidean (Point Clouds) • PointNet [1] • Treat unordered point cloud by max-pooling • Comparison b/w PointNet++ • Detailed information is lost • Cannot treat different density of point cloud Part segmentation (Per-point classification) Predict Affine transformation (Transrational, Rotational Invariance) Similar to Spatial Transformer Networks in 2D Classification 𝑓 𝑥1, ⋯ , 𝑥 𝑛 = 𝑔(ℎ 𝑥1 , ⋯ , ℎ 𝑥 𝑛 ) Max-poolingInput feature MLP Symmetry Function Global + Local Feature Randomly rotating the object along up-axis, Normalization in unit square Affine transformation [1] [1] C. R. Qi, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation", 2016
  • 23. Non-Euclidean (Point Clouds) • PointNet • T-Net [1] • Similar to Spatial Transformer Networks in 2D • Spatial Transformer Networks • Alignment of image (transformation, rotation, distortion etc.) by spatial transformation • Learn affine transformation from input data (not necessarily special data) • Can insert this networks at each point b/w networks Reference Contents Paper Original Paper Sample (PyTorch) Dataset : MNIST [1] M. Jaderberg et al. "Spatial transformer networks",2015 [1]
  • 24. Non-Euclidean (Point Clouds) • PointNet • Spatial Transformer Networks • Localization net : output parameters 𝜃 to transform for input feature map 𝑈 • Combination of Conv, MaxPool, ReLU, FC • Output : 2 × 3 • Grid generator : create sampling grid by using the parameters • Sampler : Output transformed feature map 𝑉 • pixel Spatial Transformer Networks (2D) Grid generator Input map to transformed map Input feature map Output feature map 𝑥𝑖 𝑠 𝑦𝑖 𝑠 = 𝒯𝜃 𝐺𝑖 = 𝐴 𝜃 𝑥𝑖 𝑡 𝑦𝑖 𝑡 1 2 × 3 [1] M. Jaderberg et al. "Spatial transformer networks",2015 [1][1] [1]
  • 25. Non-Euclidean (Point Clouds) • PointNet • T-Net • 3D ver. of Spatial Transformer Networks in 2D • Not need sampling grid (There are no gird structure in 3D) • Directly apply transformation to each point cloud • Output parameter • 3 × 3 in first T-Net • 64 × 64 in second T-Net T-Net (input feature : 3) T-Net (input feature : 64) [1] [1] C. R. Qi, "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation", 2016
  • 26. Non-Euclidean (Point Clouds) • PointNet++ [1] • Comparison b/w PointNet • Detailed information is kept • Can treat different density of point cloud Concatenation of multi-resolution [1] C. R. Qi et al. "Pointnet++: Deep hierarchical feature learning on point sets in a metric space", 2017 [1]
  • 27. Non-Euclidean (Point Clouds) • PointNet++ • Set abstraction • Grouping in one scale + feature extraction • Sampling Layer : Extraction of sampling points by farthest point sampling (FPS) • Grouping Layer : Grouping points around sampling points • PointNet Layer : Applying PointNet Sampling Layer Grouping Layer 𝑟
  • 28. Non-Euclidean (Point Clouds) • PointNet++ • Point Feature Propagation for segmentation • Interpolation : interpolation from k neighbor points • Concatenation Interpolation 𝑓 𝑗 𝑥 = 𝑖=1 𝑘 𝑤𝑖 𝑥 𝑓𝑖 (𝑗) 𝑖=1 𝑘 𝑤𝑖 𝑥 𝑘 = 3 𝑤𝑖 𝑥 = 1 𝑑 𝑥, 𝑥𝑖 2 𝑑 𝑥 𝑥𝑖 𝑥 feature weight Inverse of distance[1] C. R. Qi et al. "Pointnet++: Deep hierarchical feature learning on point sets in a metric space", 2017 [1]
  • 29. Non-Euclidean (Point Clouds) • PointNet++ • Single scale grouping • Multi scale/resolution grouping • Combination of features from different scales • Robust for non-uniform sampling density • Modifying architecture in set abstraction level 𝐿𝑖−1 Level 𝐿𝑖 Level Original Points multi-resolution grouping (MRG) Concatenation of information of multi-resolutionHigh computational cost multi-scale grouping (MSG) Recommendation [1] C. R. Qi et al. "Pointnet++: Deep hierarchical feature learning on point sets in a metric space", 2017 [1] [1]
  • 30. Non-Euclidean (Point Clouds) • PointNet++ • Detail of architecture • Note: #vertex is fixed 𝑆𝐴 512, 0.2, 64, 64, 128 → 𝑆𝐴 128, 0.2, 64, 64, 128 → 𝑆𝐴 256,512,1024 #𝑖𝑛𝑝𝑢𝑡 𝑣𝑒𝑟𝑡𝑒𝑥: 1024 → 𝐹𝐶 512,0.5 → 𝐹𝐶 256,0.5 → 𝐹𝐶(𝐾) class → 𝐹𝑃 256, 256 → 𝐹𝑃 256,128 → 𝐹𝑃(128,128,128,128, 𝐾) per point segmentation 1024/2 512/4 Architecture for classification and part segmentation of ModelNet using single scale grouping 𝑆𝐴 𝐾, 𝑟, ℓ1, ⋯ , ℓ 𝑑 #vertex radius Pointnet (#FC:d) 𝑆𝐴 ℓ1, ⋯ , ℓ 𝑑 Set abstraction level Global Set abstraction level #FC:d Convert single vector by maxpooling For classification For part segmentation Same in cls. and seg. 𝐹𝐶 ℓ, 𝑑𝑝 𝐹𝑃 ℓ1, ⋯ , ℓ 𝑑 Fully Connected Feature Propagation Channel Ratio of dropout #FC:d
  • 31. Non-Euclidean (Point Clouds) • PointNet++ • Detail of architecture • Note: #vertex is fixed #𝑖𝑛𝑝𝑢𝑡 𝑣𝑒𝑟𝑡𝑒𝑥: 1024 Architecture classification of ModelNet using multi-resolution grouping (MRG) 𝑆𝐴 512, 0.2, 64, 64, 128 → 𝑆𝐴 64, 0.4, 128, 128, 256 𝑆𝐴 512, 0.4, 64, 128, 256 𝑆𝐴 64, 128, 256,512 𝑆𝐴 256,512,1024 Concat. Concat. → 𝐹𝐶 512,0.5 → 𝐹𝐶 256,0.5 → 𝐹𝐶(𝐾) Same as single scale grouping class
  • 32. Non-Euclidean (Point Clouds) • Dynamic Graph CNN (DGCNN) [1] • PointNet + w/ Edge Conv. • Edge Conv. • Create local edge structure dynamically (not fixed in each layer) Edge Conv. PointNet+ w/ Edge Conv. 𝒙𝑖′ = 𝑗∈𝑁(𝑖) ℎ 𝚯(𝒙𝑖, 𝒙𝑗 − 𝒙𝑖) global local Search neighbors in feature space by kNN [1] Y. Wang, "Dynamic Graph CNN for Learning on Point Clouds", 2018 [1] [1]
  • 33. Non-Euclidean (Point Clouds) • PointCNN [1] • Downsampling information from neighborhoods into fewer representative points Χ-Conv. Lower resolution, deeper channels Decreasing #representative points, deeper channels 𝒙𝑖′ = 𝐶𝑜𝑛𝑣 𝑲, 𝛾Θ 𝑷𝑖 − 𝒑𝑖 × ℎΘ 𝑷𝑖 − 𝒑𝑖 , 𝒙𝑖 Input feature MLP applied individually on each point like PointNet Kernel Concatenation [1] Y. Li et al. "PointCNN: Convolution On X-Transformed Points", 2018 [1] [1]
  • 34. Non-Euclidean (Mesh) • Representation of 3D data [1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018 [1]
  • 35. Non-Euclidean (Mesh) • Each Non-Euclidean Method (Mesh) Method Application Link MeshCNN Classification Segmentation Paper GitHub (PyTorch) MeshNet Classification Paper GitHub (PyTorch)
  • 36. Non-Euclidean (Mesh) • MeshCNN [1] • Edge collapse by pooling • Can apply only the manifold mesh use in segmentation EdgeEdge collapse by pooling Input feature Angle Length Pooling / Unpooling [1] R. Hanocka et al., "MeshCNN: A Network with an Edge", 2018 [1] [1] [1]
  • 37. Non-Euclidean (Mesh) • MeshNet • Input feature • Center, corner, normal, neighbor index Information of neighborhood of face Mesh Conv. (Combination + Aggregation) [1] Y. Feng et al. "MeshNet: Mesh Neural Network for 3D Shape Representation", 2018 [1] [1] [1]
  • 38. Non-Euclidean (Graph) • Representation of 3D data [1] E. Ahmed et al, "A survey on Deep Learning Advances on Different 3D Data Representations", 2018 [1]
  • 39. Non-Euclidean (Graph) • Each Non-Euclidean Method (Graph) • Spectral / Spatial Method Spectral Spatial Euclidean (1D) Non-Euclidean (Manifold) ∆𝜙𝑖 = 𝜆𝑖 𝜙𝑖 𝜙𝑖 = 𝑒 𝑖𝜔𝑥 , 𝜆𝑖 = 𝜔2 Local coordinate 𝐷𝑗 𝑥 𝑓 = 𝑦∈𝑁(𝑥) 𝜔𝑗(𝒖 𝑥, 𝑦 )𝑓(𝑦) 𝑓 ∗ 𝑔 𝑥 = 𝑗=1 𝐽 𝑔𝑗 𝐷𝑗 𝑥 𝑓 Patch Operator Pseudo-coordinate Convolution Generalization of Fourier basis [1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016 [2] J. Masci et al. "Geodesic convolutional neural networks on Riemannian manifolds", 2015 [3] M. Fey et al. "SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels", 2017 [1] [1] [3] [2]
  • 40. Non-Euclidean (Graph) • Each Non-Euclidean Method (Graph) • Spatial method is more useful than spectral method. Method Structure Feature Spectral • Fourier basis in manifold • Laplacian eigenvalue/eigenvector • Spectral filter coefficients is base dependent in some method • No locality in some method • High computational cost Spatial • Create local coordinate • Patch operator + Conv. • Locality • Efficient computational cost ※Some equations from following pages are referred to the documents in PyTorch-geometric. (https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html) I will explain PyTorch-geometric in later page.
  • 41. Non-Euclidean (Graph) • Each Non-Euclidean Method (Graph) • Spectral, Spectral free Method Method Application Link Spectral CNN Spectral Graph Paper Chebyshev Spectral CNN (ChebNet) Spectral free Graph Paper GitHub (TensorFlow) PyTorch-geometric (ChebConv) Graph Convolutional Network (GCN) Spectral free Graph Paper PyTorch-geometric (GCNConv) Graph Neural Network (GNN) Spectral free Graph Paper
  • 42. Non-Euclidean (Graph) • Spectral CNN [2] • cannot use different shape • Spectral filter coefficients is base dependent • High computational cost • No locality ∆𝑓 𝑖 ∝ 𝑖,𝑗 ∈ℰ 𝜔𝑖𝑗(𝑓𝑖 − 𝑓𝑗)Laplacian Different shape -> different basis -> different result 𝒇ℓ 𝑜𝑢𝑡 = 𝜉 ℓ′=1 𝑝 𝚽 𝑘 𝑮ℓ,ℓ′ 𝚽 𝑘 𝑇 𝒇ℓ′ 𝑖𝑛 Laplacian eigenvectorReLU [1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016 [2] J. Bruna et al. "Spectral Networks and Locally Connected Networks on Graphs", 2013 [1] [1]
  • 43. Non-Euclidean (Graph) • Chebyshev Spectral CNN (ChebNet) [1] • Not calculate Laplacian eigenvectors directly • Locality (K hops) • Approximate filter as polynomial • Graph Convolutional Network (GCN) [2] • Special ver. of ChebNet (𝐾 = 2) 𝑋′ = 𝑘=0 𝐾−1 𝑍(𝑘) ⋅ Θ(𝑘) 𝑍(0) = 𝑋 𝑍(1) = 𝐿 ⋅ 𝑋 𝑍(𝑘) = 2 ⋅ 𝐿 ⋅ 𝑍 𝑘−1 − 𝑍(𝑘−2) 𝐿:scaled and normalized Laplacian [1] M. Defferrard et al. "Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering", 2016 [2] T. N. Kipf et al. "Semi-Supervised Classification with Graph Convolutional Networks", 2016
  • 44. Non-Euclidean (Graph) • Each Non-Euclidean Method (Graph) • Charting Application Link Geodesic CNN Mesh Shape retrieval / correspondence Paper Anisotropic CNN Mesh / point cloud Shape correspondence Paper MoNet Graph / mesh / point cloud Shape correspondence Paper PyTorch-geometric (GMMConv) SplineCNN Graph / Mesh Classification Shape correspondence Paper GitHub (PyTorch) PyTorch-geometric (SplineConv) FeaStNet Graph / Mesh Shape correspondence Segmentation Paper PyTorch-geometric (FeaStConv)
  • 45. Non-Euclidean (Graph) • Geodesic CNN (GCNN)[1] ⊂ Anisotropic CNN (ACNN)[2] ⊂ MoNet [3] MoNet 𝐷𝑗 𝑥 𝑓 = 𝑦∈𝑁(𝑥) 𝜔𝑗(𝒖 𝑥, 𝑦 )𝑓(𝑦) 𝑓 ∗ 𝑔 𝑥 = 𝑗=1 𝐽 𝑔𝑗 𝐷𝑗 𝑥 𝑓 Patch Operator Pseudo-coordinate Convolution 𝜔𝑗 𝒖 = exp − 1 2 𝒖 − 𝜇 𝑗 T Σj −1 (𝒖 − 𝜇 𝑗) 𝜔𝑗 𝒖 = exp − 1 2 𝒖 𝑇 𝑹 𝜃 𝑗 𝛼 0 0 1 𝑹 𝜃 𝑗 𝑇 𝒖ACNN GCNN 𝜔𝑗 𝒖 = exp − 1 2 𝒖 − 𝑢𝑗 T 𝜎𝜌 2 0 0 𝜎 𝜃 2 (𝒖 − 𝑢𝑗) Rotation of 𝜃 to the maximum curvature direction The degree of anisotropy covariance (radius, angle direction) Learning parameters [1] J. Masci et al. "Geodesic convolutional neural networks on Riemannian manifolds", 2015 [2] D Boscaini et al. "Learning shape correspondence with anisotropic convolutional neural networks", 2016 [3] F. Monti et al. "Geometric deep learning on graphs and manifolds using mixture model CNNs", 2016 [4] M. Fey et al. "SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels", 2017 [1] [4]
  • 46. Non-Euclidean (Graph) • Geodesic CNN (GCNN) • Create local coordinate • Do not verify the meaningful chart (need to create small radius chart) • Anisotropic CNN (ACNN) • Fourier basis is based on anisotropic heat diffusion eq. • MoNet • Learn filter as parametric kernel • Generalization of geodesic CNN and anisotropic CNN
  • 47. Non-Euclidean (Graph) • SplineCNN [1] • Filter based on B-spline function • Efficient computational cost 𝒙𝑖′ = 1 |𝑁 𝑖 | 𝑗∈𝑁(𝑖) 𝒙𝑖 ⋅ ℎ 𝚯(𝒆𝑖,𝑗) Weighted B-Spline basis [1] M. Fey et al. "SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels", 2017 [1]
  • 48. Non-Euclidean (Graph) • FeaStNet [1] • Dynamically determine relation b/w filter weight and local graph of a node 𝒙𝑖′ = 1 |𝑁 𝑖 | 𝑗∈𝑁(𝑖) 𝑚=1 𝑀 𝑞 𝑚 𝒙𝑖, 𝒙𝑗 𝑾 𝑚 𝒙𝑗 Filter (e.g. 𝑀 = 3 × 3 = 9) Euclidean FeaStNet Weight Input Output Input Output #neighbor (e.g. 𝑁 = 6) 𝑞 𝑚 𝒙𝑖, 𝒙𝑗 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥𝑗 𝒖 𝒎 𝑻 𝒙𝒊 − 𝒙𝒋 + 𝑐 𝑚 𝒙𝑖′ = 𝑚=1 𝑀 𝑾 𝑚 𝒙 𝑛(𝑚,𝑖) pixel D input featureE output feature [1] N Verma et al. "FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis", 2017 [1][1]
  • 49. Non-Euclidean (Graph) • PyTorch-geometric • https://github.com/rusty1s/pytorch_geometric • Library based on PyTorch • For point cloud, mesh (not only graph) • Include Point cloud, graph-type approach code • PointNet++, DGCNN, PointCNN • ChebNet, GCN, MoNet, SplineCNN, FeaStNet • Easy to get the famous sample data and transform same data format • ModelNet, ShapeNet, etc. • Many example and benchmark
  • 50. Accuracy • Accuracy (Classification) • around 90% in any method (except VoxNet) Method “ModelNet40” Overall Acc. [%] / Mean class Acc. [%] “SHREC” Overall Acc. [%] VoxNet 85.9 / 83.0 − MVCNN − / 90.1 𝟗𝟔. 𝟎𝟗 PointNet 89.2 / 86.0 − PointNet++ 90.7 / − − DGCNN 𝟗𝟐. 𝟗 / 𝟗𝟎. 𝟐 − PointCNN 92.2 / 88.1 − MeshNet − / 91.9 − MeshCNN − / − 91.0 ※Please refer the detail in each paper (mentioned in each page)
  • 51. Accuracy • Accuracy (Segmentation) Method Part segmentation “ShapeNet” mIoU (mean per-class part-averaged IoU) [%] Part segmentation “ScanNet” Acc. [%] Part segmentation “COSEG” Acc. [%] Scene segmentation “S3DIS” Acc. [%] / mIoU [%] Human body segmentation “including SCAPE, FAUST etc.” Acc. [%] PointNet 80.4 (57.9 − 95.3) 73.9 54.4 − 91.5 78.6 / − 90.77 PointNet++ 81.9 (58.7 − 95.3) 84.5 79.1 − 98.9 − / − − DGCNN 82.3 (𝟔𝟑. 𝟓 − 𝟗𝟓. 𝟕) − − 𝟖𝟒. 𝟏 / 56.1 − PointCNN 𝟖𝟒. 𝟔 𝟖𝟓. 𝟏 − − / 𝟔𝟓. 𝟑𝟗 − MeshCNN − − 𝟗𝟕. 𝟓𝟔 − 𝟗𝟗. 𝟔𝟑 − / − 𝟗𝟐. 𝟑𝟎 FeaStNet 81.5 − − − / − − ※Please refer the detail in each paper (mentioned in each page)
  • 52. Dataset • 3D Dataset Contents Data Format Purpose PyTorch-geometric ModelNet10/40 3D CAD Model (10 or 40 classes) Mesh (.OFF) Classification ModelNet ShapeNet 3D Shape Point Cloud (.pts) Segmentation ShapeNet ScanNet Indoor Scan Data Mesh (.ply) Segmentation - S3DIS (original, .h5) Indoor Scan Data Point Cloud Segmentation S3DIS ScanNet:registration required S3DIS : registration required (for original)
  • 53. Dataset • 3D Dataset Contents Data Format Purpose PyTorch-geometric SHREC many type for each contest - Retrieval - SHREC2016 Animal, Human (Part Data) Mesh (.OFF) Correspondence SHREC2016 TOSCA Animal, Human Mesh (same #vertices at each category, separate file of vertices and triangles) Correspondence TOSCA PCPNet 3D Shape Point Cloud (.xyz) (Including normal, curvature files.) Estimation of local shape (Normal, curvature) PCPNet FAUST Human body Mesh Correspondence FAUST FAUST(Note) : registration required
  • 54. Material • Material of 3D deep learning (3D / point cloud) Paper Comment A survey on Deep Learning Advances on Different 3D Data Representations • Review of 3D Deep Learning • Easier to read it • Written from point of view about Euclidean and Non-Euclidean method Paperwithcode • Paper w/ code about 3D Point Cloud Deep Learning Survey Ver. 2 • Deep learning for point cloud • Survey of many papers
  • 55. Material • Material of 3D deep learning (graph) Paper Comment Geometric deep learning: going beyond Euclidean data • Review of geometric deep learning Geometric Deep Learning • summary of paper and code about geometric deep learning Geometric Deep Learning on Graphs and Manifolds (NIPS2017) • Presentation (youtube) about geometric deep learning
  • 56. Summary • There are many methods of 3D deep learning. • Two main method • Euclidean vs Non-Euclidean • Euclidean Method • Projections / Multi-View / Voxel • Non-Euclidean Method • Point Cloud / Mesh / Graph • Each method have merit and demerit. • We need to choose the better method for each data type and application. • The research about 3D deep learning is growing.
  • 57. Appendix • Appendix • Mesh Generation • Laplacian on Graph • Correspondence
  • 58. Appendix : Mesh Generation • Mesh Generation • In this material, I have summarized these materials. Link Contents 点群面張り(精密工学会) • Surface reconstruction メッシュ処理(精密工学会) • Mesh processing CV勉強会@関東発表資料 点群再構成に関するサーベイ • Survey of point cloud reconstruction
  • 59. Appendix : Mesh Generation • Difficulty of Mesh Generation Processing Difficulty Pre-processing Reduction of Noise / Missing / Abnormal value / density difference of vertices Post-processing Mesh smoothing / hole filling Ground Truth Noise / Abnormal value Missing / density difference of vertices Mesh smoothing / hole filling
  • 60. Appendix : Mesh Generation • Kinds of Mesh Generation Kind Feature Classification of the method Direct Triangulation Direct mesh generation form point cloud Explicit method Surface Smoothness Smooth surface mesh from point cloud Implicit method Direct Triangulation Surface Smoothness
  • 61. Appendix : Mesh Generation • Classification of the method • In general, it is easier to use the implicit method, since there are noise of point cloud. Classification of the method Information to use Influence of noise and density of vertices Guarantee of accuracy Explicit method Vertices Large (error of vertices = error of meshes) ◎ Implicit method Meshes based on isosurface of function fields which is calculated from vertices Small 〇
  • 62. Appendix : Mesh Generation • Kinds of Mesh Generation (Detail) • Direct Triangulation (example of built-in function in MeshLab) Method Feature Voronoi-Based Surface Reconstruction Creation of Delaunay diagram adding the vertices using Voronoi diagram Ball-Pivoting Algorithm Roll the ball over the point cloud and generate mesh from the point cloud located within a certain distance
  • 63. Appendix : Mesh Generation • Voronoi-Based Surface Reconstruction • Voronoi diagram • Region divided by the bisector of each vertices (in 2D) • Delaunay triangulation • Triangulation by connection of vertices bisector Vertices (S) Voronoi Vertices (V) Delaunay triangulation of S and V (black + red line) Example of 2D Surface (black line) [1] N. Amenta et al. "A New Voronoi-Based Surface Reconstruction Algorithm", 1998 [1] [1]
  • 64. Appendix : Mesh Generation • Ball-Pivoting Algorithm Not created Close point cloudSparse data Not created Ideal data [1] F Bernardini et al. "The Ball-Pivoting Algorithm for Surface Reconstruction", 1999 [1]
  • 65. Appendix : Mesh Generation • Kinds of Mesh Generation (Detail) • Surface Smoothness (example of built-in function in MeshLab) Method Feature Signed distance function + Marching Cubes Creation of Signed distance function by using the distance b/w vertices and surface + Mesh generation by using Marching Cubes Screened Poisson surface reconstruction (Poisson surface reconstruction) Distinguish b/w inside and outside of surface by using Poisson eq.
  • 66. Appendix : Mesh Generation • Signed distance function + Marching Cubes Oriented tangent planes Estimated signed distance Output of modified marching cubes 𝑓 𝒑 = 𝒑 − 𝒐 ⋅ 𝒏 𝑓 𝒑 > 0 → 𝑜𝑢𝑡𝑠𝑖𝑑𝑒 𝑓 𝒑 < 0 → 𝑖𝑛𝑠𝑖𝑑𝑒 𝒑 𝒑 𝒐 𝒏 𝑓 𝒑 = 0 → 𝑠𝑢𝑟𝑓𝑎𝑐𝑒 [1] H. Hoppe et al. "Surface Reconstruction from Unorganized Points", 1992 [1]
  • 67. Appendix : Mesh Generation • Screened Poisson surface reconstruction • get Indicator Function by solving the Poisson eq. ∆𝜒 ≡ ∇ ⋅ ∇𝜒 = ∇ ⋅ 𝑽 Poisson eq. Poisson surface reconstruction Screened Poisson surface reconstruction Complement b/w point cloud [1] M. Kazhdan et al. "Poisson Surface Reconstruction", 2006 [2] [1] [2] M. Kazhdan et al. "Screened Poisson Surface Reconstruction", 2013
  • 68. Appendix : Laplacian on Graph • Laplacian on Graph [1] ∆𝑓 𝑖 = 1 𝑎𝑖 𝑖,𝑗 ∈ℰ 𝜔𝑖𝑗(𝑓𝑖 − 𝑓𝑗)Laplacian (𝒱, ℰ)Graph (undirected) 𝒱 = {1, ⋯ , 𝑛} ℰ ⊆ 𝒱 × 𝒱 𝑎𝑖 𝜔𝑖𝑗 weight div. 𝑑𝑖𝑣 𝐹 𝑖 = 1 𝑎𝑖 𝑗: 𝑖,𝑗 ∈ℰ 𝜔𝑖𝑗 𝐹𝑖𝑗 Grad. ∇𝑓 𝑖𝑗 = 𝑓𝑖 − 𝑓𝑗 Mesh 𝜔𝑖𝑗 = −ℓ𝑖𝑗 2 + ℓ𝑗𝑘 2 + ℓ 𝑘𝑖 2 8𝑎𝑖𝑗𝑘 + −ℓ𝑖𝑗 2 + ℓ𝑗ℎ 2 + ℓℎ𝑖 2 8𝑎𝑖𝑗ℎ = 1 2 (cot 𝛼𝑖𝑗 + cot 𝛽𝑖𝑗) 𝑎𝑖 = 1 3 𝑗𝑘: 𝑖,𝑗,𝑘 ∈ℱ 𝑎𝑖𝑗𝑘 𝑎𝑖𝑗𝑘 = 𝑠𝑖𝑗𝑘(𝑠𝑖𝑗𝑘 − ℓ𝑖𝑗)(𝑠𝑖𝑗𝑘 − ℓ𝑗𝑘)(𝑠𝑖𝑗𝑘 − ℓ 𝑘𝑖) 1/2 𝑠𝑖𝑗𝑘 = 1 2 (𝑎𝑖𝑗 + 𝑎𝑗𝑘 + 𝑎 𝑘𝑖) 𝑓: 𝒱 → ℝ, 𝐹: ℰ → ℝ ∆≡ −𝑑𝑖𝑣 ∇ →Laplacian eigenvalues 𝜆 > 0 [1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016 [1]
  • 69. Appendix : Laplacian on Graph • Laplacian on Graph [1] Δ𝒇 = 𝑨−1 𝑫 − 𝑾 𝒇 𝒇 = 𝑓1, ⋯ , 𝑓𝑛 𝑇 𝑾 = (𝜔𝑖𝑗) 𝑨 = 𝑑𝑖𝑎𝑔(𝑎1, ⋯ , 𝑎 𝑛) 𝑫 = 𝑑𝑖𝑎𝑔 𝑗:𝑗≠𝑖 𝜔𝑖𝑗 Laplacian ∆ Condition Unnormalized graph Laplacian ∆= 𝑫 − 𝑾 𝐴 = 𝐼 Normalized Symmetry Laplacian ∆= 𝑰 − 𝑫− 𝟏 𝟐 𝑾𝑫 𝟏 𝟐 𝐴 = 𝐷 + Normalization Random walk Laplacian ∆= 𝑰 − 𝑫−1 𝑾 𝐴 = 𝐷 Laplacian (as matrix) [1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016
  • 70. Appendix : Laplacian on Graph • Laplacian on Graph [1] • Convolution (𝑓 ∗ 𝑔)(𝑥) = 𝑖≥0 𝑓𝑖 𝑔𝑖 𝜙𝑖(𝑥) 𝒇 ∗ 𝒈 = 𝚽𝑑𝑖𝑎𝑔 𝑔 𝚽T 𝐟 𝒇 = 𝑓1, ⋯ , 𝑓𝑛 𝑇 𝒈 = ( 𝑔1, ⋯ , 𝑔 𝒏) 𝚽 = (𝜙1, ⋯ , 𝜙 𝑛) Matrix Conv. [1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016
  • 71. Appendix :Correspondence • Correspondence [1] Query Reference input 𝑥𝑖 (𝑦1, ⋯ , 𝑦 𝑁) label Each query vertex has labels as all reference vertices Output (Probability) Correct Label Output (Probability) (𝑝1, 𝑝2, ⋯ , 𝑝 𝑁) (1,0, ⋯ , 0) One-hot vector Loss [1] M. M. Bronstein et al., "Geometric deep learning: going beyond Euclidean data", 2016 [1]