【CVPR 2020 メタサーベイ】3D From a Single Image and Shape-From-X

CVPR 2020の動向・気付き・メタサーベイ  
1
相澤宏旭，園山昌司，寺田英雄 
Group 1: 3D From a Single Image and Shape-From-X

2
3D From a Single Image and Shape-From-X 
• CVPR 2020の動向・気付き・メタサーベイ 
– CVPR2020で目についた論文とトピック，　　　　　　　今後流行り
そうな領域をピックアップしました． 
• Differentiable Renderer 
• Single View 3D Reconstruction 
• Implicit Function 
• 3D Action Recognition 
• Unsupervised 3D Representation Learning 
• 3D Image Manipulation 
• Single Image Depth Estimation 
• Monocular Depth Estimation 
• Viewpoint Estimation 
• Part Decomposition 
• Learnable Convex Decomposition 
• Spatial Reasoning

3
Differentiable Renderer 
• 2Dと3Dの橋渡しをする微分可能なレンダラー 
– CG分野で開発されてきたレンダリング技術を微分可能とするこ
とで深層学習に適用可能に 
• 画像認識は2D画像から3D空間を知ること． 
• 3D空間の理解は2D画像の理解につながる！ 
Colored Voxelのレンダリング方法を提案
http://www.krematas.com/nvr/index.html
(補足)
Neural RenderingのWorkshop
https://www.neuralrender.com/
Neural Rendering のサーベイ論文
https://arxiv.org/abs/2004.03805
Differentiable Rendering のサーベイ論文
https://arxiv.org/abs/2006.12057

4
Single-View 3D Reconstruction 
• 画像1枚からの3次元再構成 
– 微分可能なレンダラーの登場により大幅に進歩 
• 3D形状やカメラポーズのGTなしに対象のシルエットから3
次元再構成を学習可能 
– 一般物体かつテクスチャ付きの物体が対象に 
一般物体のテクスチャ付きの 3D Mesh Reconstruction
https://openaccess.thecvf.com/content_CVPR_2020/html/Henderson_Leveraging_2D_Data_to_Learn_Textured_3D
_Mesh_Generation_CVPR_2020_paper.html

5
Implicit Function 
• 3D形状やテクスチャを関数で表現！ 
– Voxel, Mesh, Point cloudに代わる新たな3次元表現 
• これらの表現は低解像度，疎，不連続であるなどの　弱点
を持つ 
– 3D形状をある点が形状の内部か外部かを示す関数で表現 
Articulationを持つ対象の形状を Implicit Functionで表現
https://openaccess.thecvf.com/content_CVPR_2020/html/Henderson_Leveraging_2D_Data_to_Learn_Textured_3D_Mesh_Generation_CV
PR_2020_paper.html

6
3D Action Recognition 
• 行動認識にも3Dの波が 
– 人の行動は3次元空間上で行われる故，行動認識においても3
次元空間を考慮することが重要！ 
– 3次元を考慮した認識が増えてきた 
 
モーションとそのモーションが行われた 3次元空間
上の位置を3D Dynamic Voxelとして表現
https://openaccess.thecvf.com/content_CVPR_2020/html/Wang_3DV_3D_D
ynamic_Voxel_for_Action_Recognition_in_Depth_Video_CVPR_2020_paper
.html

7
Unsupervised 3D Representation Learning 
• 物体の3D形状の表現を教師なし学習 
– Symmetryの特性を利用して，画像からDepth, Albedo,
Viewpoint, Illuminationを教師なしで分解 
• ここでもDifferentiable Rendererを利用 
http://openaccess.thecvf.com/content_CVPR_2020/html/Wu_Unsupervised_Learning_of_Probably_Symmetric_Deforma
ble_3D_Objects_From_Images_CVPR_2020_paper.html

8
3D Image Manipulation 
• 3D空間上の画像編集可能な生成モデル 
– 3D空間上の物体の位置や回転，カメラポーズなどの3D画像編
集のための生成モデルが登場 
– Photorealisticな画像生成から次のステップへ 
 
複数物体に対する3D画像編集
http://openaccess.thecvf.com/content_CVPR_2020/html/Liao_Towards_Unsupervised_Learning_of_Generative_Models_for_3D_Controllabl
e_Image_CVPR_2020_paper.html

9
Single Image Depth Estimation 
• 画像1枚からの深度推定 
– 学習データへの依存が強く，挑戦的な問題設定． 
– 物体検出やセグメンテーションの利用により性能向上． 
– Single-View 3D Reconstructionとは異なるアプローチ． 
 
 
セグメンテーションによる推定性能向上
https://openaccess.thecvf.com/content_CVPR_2020/html/Wang_SDC-Depth_Semantic_Di
vide-and-Conquer_Network_for_Monocular_Depth_Estimation_CVPR_2020_paper.html 
平面の検出による推定性能向上
https://openaccess.thecvf.com/content_CVPR_2020/html/Jiang_Peek-a-Boo_
Occlusion_Reasoning_in_Indoor_Scenes_With_Plane_Representations_CVP
R_2020_paper.html

10
Single Image Depth Estimation(dataset) 
• 良く使われるデータセット 
– 自動運転やSLAM向けの屋外で疎なDepthのデータセットより，
室内で密なDepthのデータセットが好まれる傾向にある(Outdoor
は難しすぎる？)． 
– Depth推定以外のタスクの利用によりマルチタスクなデータセッ
トがフルに活用される． 
Indoor 
• NYUv2 https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html 
• Matterport3D https://niessner.github.io/Matterport/ 
Outdoor 
• KITTI https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html 
• Cityscape https://niessner.github.io/Matterport/

11
Monocular Depth Estimation 
• 単眼カメラ+動きによる深度推定 
– 自動運転，AR等広い応用範囲を持つ．*1 
– 実データを用いた高精度なデータセットの公開により，さらなる
研究の加速が期待される． 
*1 似たような問題設定の論文が多々あり、各カテゴリに分類されているが，  
Depthのみを出力するものが3D From a Single Image and Shape-From-X  
としてカテゴライズされているっぽい．  
気になる人は3D From Multiview and SensorsやMotion and Trackingも要チェック．  
高精度・高密度なデータセットによる性能の向上
https://openaccess.thecvf.com/content_CVPR_2020/html/Guizilini_3
D_Packing_for_Self-Supervised_Monocular_Depth_Estimation_CVP
R_2020_paper.html

12
Viewpoint Estimation 
• 対象をどこから観測しているか？を推定 
– 画像は3次元世界の2Dマッピングであるので，3次元空間上の
どこから物体を観測しているかを理解することは重要 
– 教師なし，新規物体に対して取り組まれるように 
新規物体に対するViewpoint Estimation
http://openaccess.thecvf.com/content_CVPR_2020/html/Banani_
Novel_Object_Viewpoint_Estimation_Through_Reconstruction_
Alignment_CVPR_2020_paper.html
生成モデルを利用して教師なし Viewpoint
Estimation
https://openaccess.thecvf.com/content_CVPR_2020/html/Musti
kovela_Self-Supervised_Viewpoint_Learning_From_Image_Coll
ections_CVPR_2020_paper.html

13
Part Decomposition 
• 3D形状をパーツの組み合わせで表現 
– 人間も3次元世界を個々の物体の集合として知覚している． 
– 3D形状をその形状を構成するパーツの組み合わせとして認識し
，パーツ間の関連性や階層性など高次な認識をしたい． 
 
パーツの関連性を考慮し複数の抽象度で物体の partをモデル化するための structure-aware rep.を学
習する方法を提案
http://openaccess.thecvf.com/content_CVPR_2020/html/Paschalidou_Learning_Unsupervised_Hierarchical_Part_Decompositi
on_of_3D_Objects_From_a_CVPR_2020_paper.html

14
Learnable Convex Decomposition 
• 新しい３D形状表現:微分可能な凸分解 
– AutoEncoder(CvxNet)により、３D形状の凸要素による低次元表
現を、２D画像から獲得可能に 
– この凸表現は、比較的軽い処理でポリゴンメッシュ等に変換で
き、CGなどのアプリケーションに好適 
https://openaccess.thecvf.com/content_CVPR_2020/html/Deng_CvxNet_Learnable_Convex_
Decomposition_CVPR_2020_paper.html

15
Spatial Reasoning 
• DNNの空間的な推論能力について調査 
– 人間は2Dに描かれたthree-view画像から，物体の3D形状を想
像し，これらの空間的な関係推論することができる．では，DNN
は？ 
– 3Dがアツい今だからこそ，上記の問いを再考すべき？ 
three-viewから，view consistencyを持つ画
像の選択, camera poseの推定, shape
generationを含む3つの2D-3D推論タスクを考
案
http://openaccess.thecvf.com/content_CVPR_2020/html/H
an_SPARE3D_A_Dataset_for_SPAtial_REasoning_on_T
hree-View_Line_Drawings_CVPR_2020_paper.html

【CVPR 2020 メタサーベイ】3D From a Single Image and Shape-From-X

More Related Content

What's hot (20)

Similar to 【CVPR 2020 メタサーベイ】3D From a Single Image and Shape-From-X (20)

【CVPR 2020 メタサーベイ】3D From a Single Image and Shape-From-X