SlideShare a Scribd company logo
DynamicFusion:	Reconstruction	and	Tracking	
of	Non-rigid	Scenes	in	Real-Time
Richard	A.	Newcombe,	Dieter	Fox,	Steven	M.	Seitz
CVPR2015,	Best	Paper	Award
論文紹介,櫻田 健 (東京工業大学),	2015年6月23日
1
論文紹介者
櫻田 健 ( http://www.vision.is.tohoku.ac.jp/us/member/sakurada )
• 東京工業大学 博士研究員(2015年4月〜)
• 東北大学岡谷研卒(2015年3月)
• Twitter	ID:	@sakuDken
• 研究内容
– 車載カメラを利用した都市の時空間モデリング
– SfM,	MVS,	CNN	…
• 主要論文
– CVPR(Poster) 1本
– BMVC(Poster)	1本
– ACCV(Oral)							1本
• “Best	Application	Paper	Honorable	Mention	Award”
– IROS	…
内容に関して何かお気づきになりましたらご連絡頂けると幸いです
Email:	sakurada@ok.ctrl.titech.ac.jp
sakurada@vision.is.tohoku.ac.jp
2
DynamicFusionの著者紹介
University	of	Washington
• Richard	Newcombe(PhD)
– KinectFusionやDTAMの著者
• Dieter	Fox(Professor)
– “Probabilistic	Robotics”(SLAMの
バイブル本)の著者
• Steven	Seitz	(Professor)
– SfMなどの大御所
3
DynamicFusion
Dense	SLAM	システム
• デプス画像を統合して動的シーンをリアルタイムで
3次元復元
– KinectFusionを動的シーンに拡張
Video:		https://www.youtube.com/watch?v=i1eZekcc_lM	
4
(a) Initial Frame at t = 0s (b) Raw (noisy) depth maps for frames at t = 1s, 10s, 15s, 20s (c) Node Distance
(d) Canonical Model (e) Canonical model warped into its live frame (f) Model Normals
Figure 2: DynamicFusion takes an online stream of noisy depth maps (a,b) and outputs a real-time dense reconstruction of the moving
scene (d,e). To achieve this, we estimate a volumetric warp (motion) field that transforms the canonical model space into the live frame,
enabling the scene motion to be undone, and all depth maps to be densely fused into a single rigid TSDF reconstruction (d,f). Simulta-
neously, the structure of the warp field is constructed as a set of sparse 6D transformation nodes that are smoothly interpolated through
a k-nearest node average in the canonical frame (c). The resulting per-frame warp field estimate enables the progressively denoised and
KinectFusion
静的シーンを対象としたDense	SLAM	システム
• 複数のデプス画像から密なサーフェスモデルを構築
• 得られたモデルに対して最新のデプス画像を位置合わせし
てカメラ姿勢を推定
“KinectFusion:	Real-Time	Dense	Surface	Mapping	 and	Tracking”		(ISMAR	2011)	
Richard	A.	Newcombe,	Shahram Izadi,	Otmar Hilliges,	David	Molyneaux,	David	Kim,	Andrew	J.	Davison,	
Pushmeet Kohi,	Jamie	Shotton,	Steve	Hodges,	Andrew	Fitzgibbon
KinectFusion: Real-Time Dense Surface Mapping and Tracking⇤
Richard A. Newcombe
Imperial College London
Shahram Izadi
Microsoft Research
Otmar Hilliges
Microsoft Research
David Molyneaux
Microsoft Research
Lancaster University
David Kim
Microsoft Research
Newcastle University
Andrew J. Davison
Imperial College London
Pushmeet Kohli
Microsoft Research
Jamie Shotton
Microsoft Research
Steve Hodges
Microsoft Research
Andrew Fitzgibbon
Microsoft Research
Figure 1: Example output from our system, generated in real-time with a handheld Kinect depth camera and no other sensing infrastructure.
Normal maps (colour) and Phong-shaded renderings (greyscale) from our dense reconstruction system are shown. On the left for comparison
is an example of the live, incomplete, and noisy data from the Kinect sensor (used as input to our system).
5
Video:		https://www.youtube.com/watch?v=quGhaggn3cQ
KinectFusion:	処理の流れ
6
KinectFusion:	カメラの移動量
• カメラ姿勢を6自由度の剛体変換で表現
– ローカルデプスマップをグローバルサーフェスに変換
7
KinectFusion:	デプスマップから頂点と法線への変換
8
• バイラテラルフィルタをデプスマップへ適用(ノイズ低減)
• 頂点(3次元点)
• 3次元点をカメラ座標からワールド座標へ変換
• 単位法線ベクトル
– 隣接ピクセルの3次元点から推定
𝐯" = 𝐷" 𝐮 K'(
𝑢, 𝑣, 1 -
𝐷" 𝐮 :	ピクセル𝐮 = 𝑢, 𝑣 -
のデプス
𝐾:	カメラの内部パラメータ
𝐯 𝒘 = 𝑻 𝒘 𝐯 𝒌
𝐍 𝑥, 𝑦 =
𝐚×𝐛
𝐚×𝐛
KinectFusion:	処理の流れ
9
カメラ移動量とサーフェスの同時推定問題
rom Depth to a Dense Oriented Point Cloud
Raw Depth ICP Outliers
Depth Map
Conversion
Model-Frame
Camera Track
Volumetric
Integration
Model
Rendering
Predicted Vertex
and Normal Maps
(Measured Vertex
and Normal Maps)
(ICP) (TSDF Fusion) (TSDF Raycast)
6DoF Pose and Raw Depth
カメラ移動量から3次元復元可能
10
カメラ移動量が分かると...
11
カメラ移動量が分かると...
12
カメラ移動量が分かると...
13
計測を統合(サーフェス生成)できる...	
14
...また,3次元形状が分かると...
15
...新しいサーフェスの位置合わせができる...
16
...サーフェスの計測誤差を最小化すると...
17
...カメラ姿勢が求まりデプスマップを統合可能...
18
KinectFusion:	サーフェス表現
19
Raw Depth ICP Outliers
Depth Map
Conversion
Model-Frame
Camera Track
Volumetric
Integration
Model
Rendering
Predicted Vertex
and Normal Maps
(Measured Vertex
and Normal Maps)
(ICP) (TSDF Fusion) (TSDF Raycast)
6DoF Pose and Raw Depth
KinectFusion: サーフェス表現
問題点
• デプスマップの計測誤差が大きい
• 〃 に穴(計測値なし)がある
解決方法
• 陰なサーフェス表現を利用
20Reference:	http://slideplayer.com/slide/3892185/#
KinectFusion: サーフェス表現
Truncated	Signed	Distance	Function	(TSDF)
21Reference:	http://slideplayer.com/slide/3892185/#
ボクセルグリッド
KinectFusion: サーフェス表現
22Reference:	http://slideplayer.com/slide/3892185/#
KinectFusion: サーフェス表現
23
[投票値] = ピクセルのデプス − [センサーからボクセルまでの距離]
Reference:	http://slideplayer.com/slide/3892185/#
KinectFusion: サーフェス表現
24Reference:	http://slideplayer.com/slide/3892185/#
[投票値] = ピクセルのデプス − [センサーからボクセルまでの距離]
KinectFusion: サーフェス表現
25Reference:	http://slideplayer.com/slide/3892185/#
[投票値] = ピクセルのデプス − [センサーからボクセルまでの距離]
KinectFusion: サーフェス表現
26Reference:	http://slideplayer.com/slide/3892185/#
[投票値] = ピクセルのデプス − [センサーからボクセルまでの距離]
KinectFusion: サーフェス表現
TSDF 𝐹", 𝑊" の更新 (𝐹":投票値,𝑊":重み)
• 𝑊=>
𝐩 = 1で良い結果が得られる
27
million new point measurements are made per second). Storing
a weight Wk(p) with each value allows an important aspect of the
global minimum of the convex L2 de-noising metric to be exploited
for real-time fusion; that the solution can be obtained incrementally
as more data terms are added using a simple weighted running av-
erage [7], defined point-wise {p|FRk
(p) 6= null}:
Fk(p) =
Wk 1(p)Fk 1(p)+WRk
(p)FRk
(p)
Wk 1(p)+WRk
(p)
(11)
Wk(p) = Wk 1(p)+WRk
(p) (12)
No update on the global TSDF is performed for values resulting
from unmeasurable regions specified in Equation 9. While Wk(p)
provides weighting of the TSDF proportional to the uncertainty of
surface measurement, we have also found that in practice simply
letting WRk
(p) = 1, resulting in a simple average, provides good re-
sults. Moreover, by truncating the updated weight over some value
Wh ,
Wk(p) min(Wk 1(p)+WRk
(p),Wh ) , (13)
million new point measurements are made per second). Storing
a weight Wk(p) with each value allows an important aspect of the
global minimum of the convex L2 de-noising metric to be exploited
for real-time fusion; that the solution can be obtained incrementally
as more data terms are added using a simple weighted running av-
erage [7], defined point-wise {p|FRk
(p) 6= null}:
Fk(p) =
Wk 1(p)Fk 1(p)+WRk
(p)FRk
(p)
Wk 1(p)+WRk
(p)
(11)
Wk(p) = Wk 1(p)+WRk
(p) (12)
No update on the global TSDF is performed for values resulting
from unmeasurable regions specified in Equation 9. While Wk(p)
provides weighting of the TSDF proportional to the uncertainty of
surface measurement, we have also found that in practice simply
letting WRk
(p) = 1, resulting in a simple average, provides good re-
sults. Moreover, by truncating the updated weight over some value
Fk(p) =
Wk 1(p)Fk 1(p)+WRk
(p)FRk
(p)
Wk 1(p)+WRk
(p)
(11)
Wk(p) = Wk 1(p)+WRk
(p) (12)
update on the global TSDF is performed for values resulting
m unmeasurable regions specified in Equation 9. While Wk(p)
ovides weighting of the TSDF proportional to the uncertainty of
face measurement, we have also found that in practice simply
ing WRk
(p) = 1, resulting in a simple average, provides good re-
ts. Moreover, by truncating the updated weight over some value
h ,
Wk(p) min(Wk 1(p)+WRk
(p),Wh ) , (13)
moving average surface reconstruction can be obtained enabling
onstruction in scenes with dynamic object motion.
Although a large number of voxels can be visited that will not
oject into the current image, the simplicity of the kernel means
eration time is memory, not computation, bound and with current
system workflow.
新しい観測現在まで
KinectFusion:	センサーの姿勢推定
点群と面の距離(point-plane	energy)を最小化
28
(Vk 1,Nk 1) which is used in our experimental section for a com-
parison between frame-to-frame and frame-model tracking.
Utilising the surface prediction, the global point-plane energy,
under the L2 norm for the desired camera pose estimate Tg,k is:
E(Tg,k) = Â
u2U
Wk(u)6=null
⇣
Tg,k
˙Vk(u) ˆV
g
k 1 (ˆu)
⌘>
ˆN
g
k 1 (ˆu)
2
, (16)
where each global frame surface prediction is obtained using the
previous fixed pose estimate Tg,k 1. The projective data as-
sociation algorithm produces the set of vertex correspondences
{Vk(u), ˆVk 1(ˆu)|W(u) 6= null} by computing the perspectively pro-
jected point, ˆu = p(KeTk 1,k
˙Vk(u)) using an estimate for the frame-
frame transform eTz
k 1,k = T 1
g,k 1
eTz
g,k and testing the predicted and
measured vertex and normal for compatibility. A threshold on the
distance of vertices and difference in normal values suffices to re-
ject grossly incorrect correspondences, also illustrated in Figure 7:
8
< Mk(u) = 1, and
ns for view-planning
y [Besl92], the ICP
ome the most widely
al shapes (a similar
d Medioni [Chen92]).
1] provide a recent
on the original ICP
McKay [Besl92], each
est point in the other
a point-to-point error
red distance between
mized. The process is
a threshold or it stops
ioni [Chen92] used a
ect of minimization is
point and the tangent
e the point-to-point
n, the point-to-plane
nlinear least squares
dt method [Press92].
ane ICP algorithm is
ion, researchers have
rates in the former
explanation of the
escribed by Pottmann
source points such that the total error between the corresponding
points, under a certain chosen error metric, is minimal.
When the point-to-plane error metric is used, the object of
minimization is the sum of the squared distance between each
source point and the tangent plane at its corresponding destination
point (see Figure 1). More specifically, if si = (six, siy, siz, 1)T
is a
source point, di = (dix, diy, diz, 1)T
is the corresponding destination
point, and ni = (nix, niy, niz, 0)T
is the unit normal vector at di, then
the goal of each ICP iteration is to find Mopt such that
( )( )∑ •−⋅=
i
iii
2
opt minarg ndsMM M (1)
where M and Mopt are 4×4 3D rigid-body transformation matrices.
Figure 1: Point-to-plane error between two surfaces.
tangent
plane
s1
source
point
destination
point
d1
n1
unit
normal
s2
d2
n2
s3
d3
n3
destination
surface
source
surface
l1
l2
l3
Figure	Reference:	Low,	Kok-Lim.	"Linear	least-squares	optimization	for	point-to-plane	icp surface	registration.”
Chapel	Hill,	University	of	North	Carolina (2004).
2つのサーフェス間の誤差
KinectFusion:	センサーの姿勢推定
29
点群と面の距離を最小化
KinectFusion:	センサーの姿勢推定
30
点群と面の距離を最小化
KinectFusion:	センサーの姿勢推定
31
点群と面の距離を最小化
KinectFusion:	センサーの姿勢推定
点と面の対応付けと姿勢の最適化
• 頂点と法線のピラミッドマップを利用
– 𝐿 = 3 つの解像度でデプスと法線のマップを作成
𝐕 𝒍∈ 𝟏…𝑳
,	𝑵𝒍∈ 𝟏…𝑳
• Coarse-to-fine
32Figure	Reference:	http://razorvision.tumblr.com/post/15039827747/how-kinect-and-kinect-fusion-kinfu-work
KinectFusion:	実験結果
33
前後フレーム間でトラッキング
• 累積誤差が発生してドリフト
34
フレーム・モデル間でトラッキング
• グローバルなモデルに対してスキャンマッチング
• ドリフトフリー
• 前後フレーム間のトラッキングより高精度
KinectFusion:	実験結果
KinectFusion:	実験結果
ボクセルの解像度と処理時間
上から順に
• デプスマップの統合
• サーフェス生成のためのレイキャスティング
• ピラミッドマップを利用したカメラ姿勢の最適化
• ピラミッドマップの各スケール間の対応付け
• デプスマップの前処理
35
t
e
)
d Figure 12: A reconstruction result using 1
64 the memory (643 vox-
els) of the previous figures, and using only every 6th sensor frame,
demonstrating graceful degradation with drastic reductions in mem-
ory and processing resources.
Time(ms)
Voxel Resolution
64 128 192 320 448384256 512
33 3 3 3 3 3 3
KinectFusion:	課題
• 長距離軌跡のドリフト
– 明示的なループクロージングが必要
• 十分な幾何拘束が必要
– 例,一枚の平面では3自由度しか拘束されない
• 広域な問題への拡張
– 均一なボクセルではメモリ・計算量が膨大
– 疎な領域が多いため八分木のSDFを利用
36
KinectFusionからDynamicFusionへ
KinectFusionの前提
• 観測シーンは大部分が変化しない
DynamicFusion
• リアルタイム処理を保ちKinectFusionを動的かつ非
剛体なシーンへと拡張
37
KinectFusion: Real-Time Dense Surface Mapping and Tracking⇤
Richard A. Newcombe
Imperial College London
Shahram Izadi
Microsoft Research
Otmar Hilliges
Microsoft Research
David Molyneaux
Microsoft Research
Lancaster University
David Kim
Microsoft Researc
Newcastle Univers
Andrew J. Davison
Imperial College London
Pushmeet Kohli
Microsoft Research
Jamie Shotton
Microsoft Research
Steve Hodges
Microsoft Research
Andrew Fitzgibbon
Microsoft Research
Figure 1: Example output from our system, generated in real-time with a handheld Kinect depth camera and no other sensing infrastru
Normal maps (colour) and Phong-shaded renderings (greyscale) from our dense reconstruction system are shown. On the left for compa
is an example of the live, incomplete, and noisy data from the Kinect sensor (used as input to our system).
ABSTRACT
We present a system for accurate real-time mapping of complex and
arbitrary indoor scenes in variable lighting conditions, using only a
moving low-cost depth camera and commodity graphics hardware.
We fuse all of the depth data streamed from a Kinect sensor into
a single global implicit surface model of the observed scene in
real-time. The current sensor pose is simultaneously obtained by
1 INTRODUCTION
Real-time infrastructure-free tracking of a handheld camera w
simultaneously mapping the physical scene in high-detail pro
new possibilities for augmented and mixed reality application
In computer vision, research on structure from motion (
and multi-view stereo (MVS) has produced many compellin
sults, in particular accurate camera tracking and sparse recon
DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-Time
Richard A. Newcombe
newcombe@cs.washington.edu
Dieter Fox
fox@cs.washington.edu
University of Washington, Seattle
Steven M. Seitz
seitz@cs.washington.edu
Figure 1: Real-time reconstructions of a moving scene with DynamicFusion; both the person and the camera are moving. The initially
noisy and incomplete model is progressively denoised and completed over time (left to right).
DynamicFusion:	概要
• ノイズの大きい連続デプス画像を入力
• 動的シーンの密な3次元形状をリアルタイムで出力
38
(a) Initial Frame at t = 0s (b) Raw (noisy) depth maps for frames at t = 1s, 10s, 15s, 20s (c) N
(d) Canonical Model (e) Canonical model warped into its live frame (f) Mo
Figure 2: DynamicFusion takes an online stream of noisy depth maps (a,b) and outputs a real-time dense reconstructi
scene (d,e). To achieve this, we estimate a volumetric warp (motion) field that transforms the canonical model space in
enabling the scene motion to be undone, and all depth maps to be densely fused into a single rigid TSDF reconstructio
neously, the structure of the warp field is constructed as a set of sparse 6D transformation nodes that are smoothly inte
a k-nearest node average in the canonical frame (c). The resulting per-frame warp field estimate enables the progressiv
completed scene geometry to be transformed into the live frame in real-time (e). In (e) we also visualise motion trails
(a) Initial Frame at t = 0s (b) Raw (noisy) depth maps for frames at t = 1s, 10s, 15s, 20s (c) Node D
(d) Canonical Model (e) Canonical model warped into its live frame (f) Model N
DynamicFusion: 概要
• ワープフィールドを疎なノードのワープ( 6自由
度)の重み付き平均で表現
• 基準空間の各ボクセルを最新フレームへワープ
39
0s (c) Node Distance
(f) Model Normals
ime dense reconstruction of the moving
nonical model space into the live frame,
gid TSDF reconstruction (d,f). Simulta-
(a) Initial Frame at t = 0s (b) Raw (noisy) depth maps for frames at t = 1s, 10s, 15s, 20s
(d) Canonical Model (e) Canonical model warped into its live frame
Figure 2: DynamicFusion takes an online stream of noisy depth maps (a,b) and outputs a real-time de
scene (d,e). To achieve this, we estimate a volumetric warp (motion) field that transforms the canonical
enabling the scene motion to be undone, and all depth maps to be densely fused into a single rigid TSD
neously, the structure of the warp field is constructed as a set of sparse 6D transformation nodes that a
a k-nearest node average in the canonical frame (c). The resulting per-frame warp field estimate enable
completed scene geometry to be transformed into the live frame in real-time (e). In (e) we also visuali
of model vertices over the last 1 second of scene motion together with a coordinate frame showing the ri
motion. In (c) we render the nearest node to model surface distance where increased distance is mapped
of objects with both translation and rotation results in signif-
icantly better tracking and reconstruction. For each canoni-
cal point vc 2 S, Tlc = W(vc) transforms that point from
canonical space into the live, non-rigidly deformed frame of
reference.
with each unit dual-quaternio
the k-nearest transformation
R3
7! R defines a weight tha
of each node and SE3(.) con
an SE(3) transformation mat
(a) Initial Frame at t = 0s (b) Raw (noisy) depth maps for frames at t = 1s, 10s, 15s, 20s (c) Node Di
(d) Canonical Model (e) Canonical model warped into its live frame (f) Model No
Figure 2: DynamicFusion takes an online stream of noisy depth maps (a,b) and outputs a real-time dense reconstruction of t
scene (d,e). To achieve this, we estimate a volumetric warp (motion) field that transforms the canonical model space into the
補間疎なノード
ワープフィールド𝒲Jを推定
DynamicFusion: 概要
TSDFは最新フレームの空間で統合
• 最新フレームのレイが基準空間では歪曲
40
Non-rigid scene deformation Introducing an occlusion
(a) Live frame t = 0 (b) Live Frame t = 1 (c) Canonical 7! Live (d) Live frame t = 0 (e) Live Frame t = 1 (f) Canonical 7! Live
Figure 3: An illustration of how each point in the canonical frame maps, through a correct warp field, onto a ray in the live camera frame
when observing a deforming scene. In (a) the first view of a dynamic scene is observed. In the corresponding canonical frame, the warp is
initialized to the identity transform and the three rays shown in the live frame also map as straight lines in the canonical frame. As the scene
deforms in the live frame (b), the warp function transforms each point from the canonical and into the corresponding live frame location,
causing the corresponding rays to bend (c). Note that this warp can be achieved with two 6D deformation nodes (shown as circles), where
the left node applies a clockwise twist. In (d) we show a new scene that includes a cube that is about to occlude the bar. In the live frame
DynamicFusion:	幾何表現
Truncated	Signed	Distance	Function	(TSDF)
256N
のボクセルの場合,
• Kinect	Fusion
– カメラ姿勢(6パラメータ)のみ推定
• DynamicFusion
– 1フレームごとに6×256N
パラメータを推定
– KinectFusionの約1000万倍
全ボクセルのワープフィールド推定は困難
41
DynamicFusion:	密な非剛体ワープフィールド
ワープ関数
ベース:	疎なノードの変換
𝑛	個の変換ノード 𝒩𝐰𝐚𝐫𝐩
J
= 𝐝𝐠V, 𝐝𝐠 𝓌, 𝐝𝐠XYN
𝐝𝐠V
Z
∈ ℝN
:	基準空間における位置
𝐝𝐠XYN
Z
= TZ] :	ノード 𝑖の(座標)変換パラメータ
𝐝𝐠 𝓌
Z
: 放射基底重み(ノードの影響半径)
座標におけるノードの影響度合い
𝒘𝒊 𝑥 𝒄 = 𝐞𝐱𝐩 − 𝐝𝐠V
Z
− 𝑥 𝒄
𝟐
/ 2 𝐝𝐠 𝓌
Z e
𝐒 :	基準空間 𝑥 𝒄 ∈ 𝐒 :	各ボクセルの中心 42
DynamicFusion:	密な非剛体ワープフィールド
ワープ関数
補間:	密なボクセルのワープ関数
𝒲 𝑥] ≡ 𝑆𝐸3 𝐃𝐐𝐁 𝑥]
𝐃𝐐𝐁 𝒙 𝒄 =
∑ 𝐰" 𝑥] 𝐪p"]"∈q rs
∑ 𝐰" 𝑥] 𝐪p"]"∈q rs
単位デュアルクォータニオン:	𝐪p"] ∈ ℝt
𝑆𝐸3 . : デュアルクォータニオンを3次元ユークリッド
空間の座標変換行列に変換
43
予備知識:	デュアルクォータニオン
クォータニオン (William	Rowan	Hamilton,1843)
• 回転のみ
𝐪 = cos
𝜃
2
+ 𝑢r 𝐢 + 𝑢| 𝐣 + 𝑢~ 𝐤 sin
𝜃
2
𝒊 𝟐
= 𝒋 𝟐
= 𝒋 𝟐
= 𝒊𝒋𝒌 = −𝟏
単位ベクトル:			𝐮 = 𝑢r 𝐢 + 𝑢| 𝐣 + 𝑢~ 𝐤
• 3次元点 𝐩 = 𝑝r 𝐢 + 𝑝| 𝐣 + 𝑝~ 𝐤 を回転
𝐩„
= 𝐪𝐩𝐪'𝟏
44
予備知識:	デュアルクォータニオン
デュアルクォータニオン(William	Kingdon Clifford,	1873)
• 回転 +	並進
𝐪̇ = 𝐪† + 𝜖𝐪ˆ									𝜖e
= 0
𝐩„
= 𝐪̇ 𝐩𝐪̇ '𝟏
																												
回転
𝐪† = 𝑟‹ + 𝑟ri +𝑟|j +𝑟~ 𝒌 = cos
Œ
e
+ sin
Œ
e
• 𝐮
並進
𝐪ˆ = 0 +
Žr
e
𝒊 +
Ž|
e
𝒋 +
Ž~
e
𝒌
45
予備知識:	Dual	Quaternion	Blending	(DQB)
Linear	Blending	Skinning	(LBS)
𝐩𝐢• = • 𝑤Z’ 𝑇’ 𝐩𝐢
”
’•(
• 体積縮小問題
– スケール変化が原因
46
Reference:	Kavan,	Ladislav,	et	al.	"Skinning	with	dual	quaternions."	Proceedings	of	the	2007	
symposium	on	Interactive	3D	graphics	and	games.	ACM,	2007.
予備知識:	Dual	Quaternion	Blending	(DQB)
Dual	Quaternion	Skinning	(DQS)
𝐪̇ =
∑ 𝑤Z 𝐪̇ 𝐢
”
’•(
∑ 𝑤Z 𝐪̇ 𝐢
”
’•(
47
Reference:	Kavan,	Ladislav,	et	al.	"Skinning	with	dual	quaternions."	Proceedings	of	the	2007	
symposium	on	Interactive	3D	graphics	and	games.	ACM,	2007.
LBS DQS
ワープフィールド
𝒲J 𝑥] = T–‹ 𝑆𝐸3 𝐃𝐐𝐁 𝑥]
DynamicFusion:	密な非剛体ワープフィールド
48
剛体変換
(カメラの移動量)
ボクセル(ノード)ごとの
ワープフィールド
DynamicFusion:	密な非剛体サーフェスの統合
Sampled	TSDF
𝒱 𝐱 ↦ v 𝐱 ∈ ℝ,w 𝐱 ∈ ℝ
v 𝐱 :	全projective	TSDF	値の重み付き平均
w 𝐱 :	 𝐱	に関する重みの総和
49
DynamicFusion:	密な非剛体サーフェスの統合
最新のデプス画像 取得後
• 基準空間の各ボクセル中心を最新フレームの座標
系に変換
• Projective	Signed	Distance	Function	(psdf)
:	ボクセル中心の投影画素
50
デプスの計測点 ボクセル中心
𝐩𝐬𝐝𝐟(𝑥])	= K'(
𝐷J 𝑢] 𝑢]
-
, 1 -
~ − 𝑥J ~
𝑥J
-
, 1
-
= 𝒲J 𝑥] 𝑥]
-
, 1
-
𝑢] = 𝜋 K𝑥J
DynamicFusion:	密な非剛体サーフェスの統合
TSDF値の更新
𝒱 𝐱 J =
v′ 𝐱 ,w′ 𝐱 -
, 	if	𝐩𝐬𝐝𝐟 𝐝𝐜 𝐱 > −𝜏
𝒱 𝐱 J'(, 												otherwise																				
						𝐝𝐜 . :	(ボクセルID) →(ボクセル中心の座標(TSDF領域))
𝜏 > 0:	打ち切り(Truncate)する距離の閾値
v„
𝐱 =
« 𝐱 𝒕-𝟏® 𝐱 𝒕-𝟏¯°±² ³,´ ‹ 𝐱
® 𝐱 𝒕-𝟏¯‹ 𝐱
								
𝜌 = 𝐩𝐬𝐝𝐟 𝐝𝐜 𝐱 																						
𝑤„ 𝐱
= min w 𝐱 𝒕'𝟏 + 𝑤 𝐱 , 𝑤°·¸
𝑤 𝐱 ∝
(
"
∑ 𝐝𝐠‹
Z
− 𝑥] e
													Z∈q rs
51
DynamicFusion:	ワープフィールド𝒲Jの推定
エネルギー関数
𝐸 𝒲J, 𝒱, 𝐷J, ℰ = 𝐃𝐚𝐭𝐚 𝒲J, 𝒱, 𝐷J + 𝜆𝐑𝐞𝐠 𝒲J, ℰ
52
モデルから最新フレーム
への密なICPコスト
非スムースなモーションフィー
ルドに対するペナルティ
𝒱: 現在の3次元形状
𝐷J:	最新のデプスマップ
ℰ :	エッジの集合
DynamicFusion:	ワープフィールド𝒲Jの推定
現在のサーフェスモデル
• TSDF	𝒱からマーチングキューブアルゴリズムで抽出
– 0の等値面から表面形状を生成
• 基準空間におけるポリゴンメッシュとして保持
– 頂点と法線のペア: 𝒱¾] ≡ 𝑉], 𝑁]
53
DynamicFusion:	ワープフィールド𝒲Jの推定
データ項
𝐃𝐚𝐭𝐚 𝒲, 𝒱, 𝐷J ≡ • 𝜓 𝐝𝐚𝐭𝐚 𝐧pÃ
-
𝐯ÄÃ − 𝐯𝐥ÃÆ
Ã∈Ç
𝜓 𝐝𝐚𝐭𝐚:	Robust	Tukey penalty	function
54
頂点と法線の推定値(基準空間
から最新フレームの空間に変換)
計測点(デプス)
最新フレームにおける
頂点の法線方向の誤差
• This gives the “solution” as a simple least-squares problem:
ˆa =
i
wixix⊤
i
−1
i
wiyixi. (8)
Note that this solution is depends on the wi values which in turn depend
on ˆa.
• The idea is to alternate calculating ˆa and recalculating wi = w((yi −
ˆa⊤
xi)/σi).
• Here are the weight functions associated with the two estimates. For the
Cauchy ρ function,
wC(u) =
u
1 + (u/c)2
(9)
0.2
0.4
0.6
0.8
1
–6 –4 –2 0 2 4 6
and, for the Beaton-Tukey ρ function,
Beaton-Tukey 𝜌 function
0.2
0.4
0.6
0.8
1
–6 –4 –2 0 2 4 6
d, for the Beaton-Tukey ρ function,
wT (u) =
1 − u
a
2 2
|u| ≤ a
0 |u| > a
. (10)
Beaton-Tukey 𝜌 functionの例
DynamicFusion:	ワープフィールド𝒲Jの推定
正則化項
• 最新フレームで計測されない箇所のモーションを制約
𝐑𝐞𝐠 𝒲,ℰ ≡ • • 𝛼Z’ 𝜓𝐫𝐞𝐠 𝐓𝒊𝒄 𝐝𝐠V
’
− 𝐓𝒋𝒄 𝐝𝐠V
’
’∈ℰ Z
”
Z•Ê
𝜓𝐫𝐞𝐠:	Huber	penalty
𝛼Z’ = max 𝐝𝐠 𝓌
Z
, 𝐝𝐠 𝓌
’
55
ノード 𝑖 と 𝑗のエッジができる
だけ剛体を保つ制約
𝐿Î 𝑎 =
1
2
𝑎e																		 𝑎 ≤ 𝛿
𝛿 𝑎 −
1
2
𝛿 , otherwise
Huber	loss	function
Huber	loss
Squared	error	loss
Huber	loss	functionの例
DynamicFusion:	ワープフィールド𝒲Jの推定
56
エネルギー関数𝐸を最小化
• ガウス・ニュートン法
ヘッシアン: 𝐉-
𝐉 = 𝐉 𝒅
-
𝐉 𝒅 + λ𝐉 𝒓
-
𝐉 𝒓
– ブロックarrow-head行列
– ブロックコレスキー分解で効率的に計算
Arrow-head行列
DynamicFusion:	新しいノードの挿入
• 支持されてないサーフェスの頂点
min"∈q rs
𝐝𝐠V
"
− 𝑣]
𝐝𝐠 𝓌
"
≥ 1
• 新ノード 𝐝𝐠V
∗
∈ 𝐝𝐠ØV
– DQBを利用して周囲のノードから新ノードの変換
パラメータを初期化
𝐝𝐠XYN
∗
← 𝒲J 𝐝𝐠V
∗
• 更新
𝒩𝐰𝐚𝐫𝐩
J
= 𝒩𝐰𝐚𝐫𝐩
J'(
∪ 𝐝𝐠ØV, 𝐝𝐠ØXYN, 𝐝𝐠Ø 𝓌
57
サーフェスの頂点
ノードの座標と影響半径
DynamicFusion:	実験結果
58
Canonical Model for “drinking from a cup”
(a) Canonical model warped into the live frame for “drinking from a cup”
Canonical Model for “Crossing fingers”
DynamicFusion:	実験結果
59
Canonical Model for “drinking from a cup”
(a) Canonical model warped into the live frame for “drinking from a cup”
Canonical Model for “Crossing fingers”
(b) Canonical model warped into the live frame for “crossing fingers”
Figure 5: Real-time non-rigid reconstructions for two deforming scenes. Upper rows of (a) and (b) show the canonical models as they
evolve over time, lower rows show the corresponding warped geometries tracking the scene. In (a) complete models of the arm and the
cup are obtained. Note the system’s ability to deal with large motion and add surfaces not visible in the initial scene, such as the bottom of
the cup and the back side of the arm. In (b) we show full body motions including clasping of the hands where we note that the model stays
consistent throughout the interaction.
tracking scenes with more fluid deformations than shown
in the results, but the long term stability can degrade and
tracking will fail when the observed data term is not able to
5. Conclusions
DynamicFusion:	課題
• グラフ構造の急激な変化に弱い
– トポロジー的に閉じている状態から開いてる状態
へ急に遷移すると失敗し易い
– 結合しているノードは急に分離できない
• リアルタイム差分トラッキングに共通の失敗
– ループクロージングの失敗
– フレーム間の大きすぎる移動
60
まとめ
• ダイナミックシーンの密なリアルタイム3次元
復元手法を提案
– シーンの静的仮定なし
• 3次元TSDFの統合を非剛体に一般化
• 3次元(6自由度)のワープフィールドをリアル
タイムで推定
61
END
62

More Related Content

PDF
30th コンピュータビジョン勉強会@関東 DynamicFusion
PPTX
Go-ICP: グローバル最適(Globally optimal) なICPの解説
PPTX
ORB-SLAMの手法解説
PPTX
SLAM勉強会(3) LSD-SLAM
PDF
三次元表現まとめ(深層学習を中心に)
PPTX
SfM Learner系単眼深度推定手法について
PPTX
【DL輪読会】HexPlaneとK-Planes
PDF
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...
30th コンピュータビジョン勉強会@関東 DynamicFusion
Go-ICP: グローバル最適(Globally optimal) なICPの解説
ORB-SLAMの手法解説
SLAM勉強会(3) LSD-SLAM
三次元表現まとめ(深層学習を中心に)
SfM Learner系単眼深度推定手法について
【DL輪読会】HexPlaneとK-Planes
[DL輪読会]A Higher-Dimensional Representation for Topologically Varying Neural R...

What's hot (20)

PPTX
CNN-SLAMざっくり
PDF
SSII2019TS: 実践カメラキャリブレーション ~カメラを用いた実世界計測の基礎と応用~
PDF
3次元レジストレーションの基礎とOpen3Dを用いた3次元点群処理
PDF
ORB-SLAMを動かしてみた
PDF
LiDAR-SLAM チュートリアル資料
PDF
Visual SLAM: Why Bundle Adjust?の解説(第4回3D勉強会@関東)
PDF
Visual slam
PDF
Cartographer を用いた 3D SLAM
PDF
コンピューテーショナルフォトグラフィ
PPTX
[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...
PDF
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PPTX
SLAM勉強会(PTAM)
PDF
確率モデルを用いた3D点群レジストレーション
PPTX
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
PDF
SSII2019企画: 画像および LiDAR を用いた自動走行に関する動向
PDF
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
PDF
Neural scene representation and rendering の解説(第3回3D勉強会@関東)
PDF
つくばチャレンジ2019技術調査報告
PDF
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
PDF
【DL輪読会】Patches Are All You Need? (ConvMixer)
CNN-SLAMざっくり
SSII2019TS: 実践カメラキャリブレーション ~カメラを用いた実世界計測の基礎と応用~
3次元レジストレーションの基礎とOpen3Dを用いた3次元点群処理
ORB-SLAMを動かしてみた
LiDAR-SLAM チュートリアル資料
Visual SLAM: Why Bundle Adjust?の解説(第4回3D勉強会@関東)
Visual slam
Cartographer を用いた 3D SLAM
コンピューテーショナルフォトグラフィ
[DL輪読会]PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metr...
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
SLAM勉強会(PTAM)
確率モデルを用いた3D点群レジストレーション
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
SSII2019企画: 画像および LiDAR を用いた自動走行に関する動向
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
Neural scene representation and rendering の解説(第3回3D勉強会@関東)
つくばチャレンジ2019技術調査報告
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
【DL輪読会】Patches Are All You Need? (ConvMixer)
Ad

Similar to 論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real-­‐Time" (20)

PDF
Photoacoustic tomography based on the application of virtual detectors
PDF
Colored inversion
PPTX
Presnt3
PDF
reportVPLProject
PDF
Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
PPTX
Lossless image compression via by lifting scheme
PDF
Parallel Evaluation of Multi-Semi-Joins
PDF
PAP245gauss
PDF
Network sampling and applications to big data and machine learning
PDF
TransNeRF
PDF
Time Multiplexed VLSI Architecture for Real-Time Barrel Distortion Correction...
PDF
D143136
PDF
DICTA 2017 poster
PDF
Computer Graphics
PDF
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
PDF
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
PDF
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
PDF
Distortion Correction Scheme for Multiresolution Camera Images
PDF
Kk2518251830
PDF
Kk2518251830
Photoacoustic tomography based on the application of virtual detectors
Colored inversion
Presnt3
reportVPLProject
Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
Lossless image compression via by lifting scheme
Parallel Evaluation of Multi-Semi-Joins
PAP245gauss
Network sampling and applications to big data and machine learning
TransNeRF
Time Multiplexed VLSI Architecture for Real-Time Barrel Distortion Correction...
D143136
DICTA 2017 poster
Computer Graphics
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
Distortion Correction Scheme for Multiresolution Camera Images
Kk2518251830
Kk2518251830
Ad

Recently uploaded (20)

PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPT
What is a Computer? Input Devices /output devices
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
project resource management chapter-09.pdf
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
August Patch Tuesday
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Architecture types and enterprise applications.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
The various Industrial Revolutions .pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Getting started with AI Agents and Multi-Agent Systems
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
cloud_computing_Infrastucture_as_cloud_p
What is a Computer? Input Devices /output devices
TLE Review Electricity (Electricity).pptx
Getting Started with Data Integration: FME Form 101
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
project resource management chapter-09.pdf
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Assigned Numbers - 2025 - Bluetooth® Document
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
August Patch Tuesday
1 - Historical Antecedents, Social Consideration.pdf
Web App vs Mobile App What Should You Build First.pdf
Architecture types and enterprise applications.pdf
WOOl fibre morphology and structure.pdf for textiles
Developing a website for English-speaking practice to English as a foreign la...
The various Industrial Revolutions .pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf

論文紹介"DynamicFusion: Reconstruction and Tracking of Non-­‐rigid Scenes in Real-­‐Time"