SlideShare a Scribd company logo
Fast Structure from Motion for
  Planar Image Sequences
        Andreas Weishaupt, Luigi Bagnato, Pierre Vandergheynst




                                                   ÉCOLE POLYTECHNIQUE
                                                   FÉDÉRALE DE LAUSANNE
2




Depth Map




            ÉCOLE POLYTECHNIQUE
            FÉDÉRALE DE LAUSANNE
3

Motion

         Depth Map




                     ÉCOLE POLYTECHNIQUE
                     FÉDÉRALE DE LAUSANNE
4

Motion

         Depth Map




                     ÉCOLE POLYTECHNIQUE
                     FÉDÉRALE DE LAUSANNE
5

Motion

         Depth Map




                     ÉCOLE POLYTECHNIQUE
                     FÉDÉRALE DE LAUSANNE
6




Motivations
   Cinema 3D
                                   Philips 2D+Depth

   Autonomous Navigation (SLAM)

   3D scanning/modeling




                                             ÉCOLE POLYTECHNIQUE
                                             FÉDÉRALE DE LAUSANNE
6




Motivations
   Cinema 3D
                                   Philips 2D+Depth

   Autonomous Navigation (SLAM)

   3D scanning/modeling


    Target:
    - Real time performances
    - Good Accuracy



                                             ÉCOLE POLYTECHNIQUE
                                             FÉDÉRALE DE LAUSANNE
7




Problem Formulation
We consider only 2 consecutive frames I0 and I1



Brightness Consistency Equation (BCE)

I1 (x + u) − I0 (x) = 0


                                                   f: focal
                                                   t: camera translation
                                                  d: distance/depth
                                                  Ω: camera rotation
                                                  u: optical flow




                                                             ÉCOLE POLYTECHNIQUE
                                                             FÉDÉRALE DE LAUSANNE
7




Problem Formulation
We consider only 2 consecutive frames I0 and I1



Brightness Consistency Equation (BCE)

I1 (x + u) − I0 (x) = 0


                                                   f: focal
                                                   t: camera translation
                                                  d: distance/depth
                                                  Ω: camera rotation
                                                  u: optical flow




                                                             ÉCOLE POLYTECHNIQUE
                                                             FÉDÉRALE DE LAUSANNE
7




Problem Formulation
We consider only 2 consecutive frames I0 and I1



Brightness Consistency Equation (BCE)
                                                         u
I1 (x + u) − I0 (x) = 0


                                                   f: focal
                                                   t: camera translation
                                                  d: distance/depth
                                                  Ω: camera rotation
                                                  u: optical flow




                                                             ÉCOLE POLYTECHNIQUE
                                                             FÉDÉRALE DE LAUSANNE
7




Problem Formulation
We consider only 2 consecutive frames I0 and I1



Brightness Consistency Equation (BCE)
                                                              u
I1 (x + u) − I0 (x) = 0


If we assume that the motion between frames is small    f: focal
                                                        t: camera translation
Linearization                                          d: distance/depth
                                                       Ω: camera rotation
                 T                                     u: optical flow
I1 (x) +        I1 (x)u − I0 (x)        0




                                                                  ÉCOLE POLYTECHNIQUE
                                                                  FÉDÉRALE DE LAUSANNE
They rely on finding pairs of corresponding points in succes-
                                                                                               2. MOTION IN PLANAR IMAGE
         siveand optical flow. The optical flow u is defined as the appar-
              images. This has the following consequences:                                           optimization problem and     8
           • ent motion ofdepends on thepattern of the found cor- images. The
             The final result brightness quality between two                      We model the camera movement during acqui
                                                                                                     [9] we know that a convex
                                                                                 successive frames by the rigid 3D translation
             respondence. If the match is not exact the reconstruction
             central be accurate. on the object plane parallel to the,tsensor Consequently, during camera mo
                       projection
     Problem Formulation - Projection Model
             will not
           • plane iscorrespondences is a computationnally expen-
             Finding given by:
                                                                                 (tx y ,tz )T .
                                                                                 p = (X,Y, Z)T becomes p = p − ∆p = p − t
             sive task: dense reconstruction cannot be performed in                                     Z camera min
             real-time. Figure 1: pinhole camera model and rigid camera= d(r)er where r = (x, y, sideZview with the p
                                                                                 p motion
                                                                                                                         ∑
                                                                                 denotes the relative ∗ = argmotion. We param    |∇Z| +
                                                                                                      Figure 2: f ) is a point on the
                                                                                                                          x∈D
                                     rz         r + r Z(r)t to be lim-
           • For real-time reconstruction, the recovery has −               r     f the focal distance and d(r) the distance or dep
                        tp =                 ·                                   .          (2)
             ited to some few feature points.+Often, tracking ofrthe rigid camera motion in the scene. close view w
                                      Z(r) pinhole camerazmodel and tical center to a point V is a2: side appro
                                  rFigure 1: rz        r Z(r)t              Z(r)                                         The pinhole c
                                                                                                     where Figure 1. We denote Z
                                                                                                               Figure
             found feature optical is employed to reduce additionnal as themotion is shown in map = inverse of d thu
                        and points flow. The optical flow u is defined and appar- Z: depth             optimization problem and
                                   r of brightness patternprojection. The opticaldepthhave V → that We solve E
                                                                                 the inverse
             Let us define Eq. 2 as the parallelsecond source images. obtained [9] central projection:
             computation cost. Tracking introduces a between two
                        ent motion                                                         The        orwe know Z. Aapoint on rela
                                                                                                          depth map.        convex the
             of error for 3D reconstruction. the object plane parallel tocan be                      iteration scheme:
                                                                                                    by
             flow cancentral projection onby the following projection on the appar- optimization problem
                         be approximatedflow. The optical flow u is defined as the
                                 and optical                                       the sensor
             Another class is ent motion of to obtain dense depth
                        plane given by:
             sensor plane:of recent methodsbrightnessmaps by between two images. The∗= r fixed r Z(r)p. afor
                                                                 pattern                              1. For pwe know that con
         maps is based on thecentral projection depth object plane parallel to the sensor = arg min ∑ |∇Z| +
                                                                                                             [9]        Z, solve 1
                                   fusion of sparse on the                im-                          Z d(r) =
                                                                                                       r
         age registration techniques, is given by:                                                                                    2θ
                                 plane e.g. r[8]. Those have the advan- r
                                                                                                                    Z x∈D
                                               zr + r + systems can be
         tage that traditional structure from motion      Z(r)t
                                                            r Z(r)t
                                  tu = rz ·
                                    p =                            − − a motion, (3) 2 we projections= arg min ∑ |∇Z
                                           rtoZ(r) rr + r Z(r)tzr. r Z(r)
                                                    ·                               . In Figure
                                                                                             (2)        have a side view of the came
         employed. However, in order rz + z accurate results,
                                                provide Z(r)tz                                      where V V a = argZ
                                                                                            as well as          is close min     ∑
                                                                                                                Z ∗∗ on the sensor plan
                                                                                                                              approxim
                                                                                                                               Vx x
         large number of depth maps has to be input forrsuchrmeth-
                                                       rz         +      Z(r)t parellel object plane. BasedZ. Eq. 1, we ∈D ∈D7
                                                                                       r            have V → on We solve can de   Eq.
1.    Parallel projectionis shown r Z(r) rz + r Z(r)tz r Z(r)
             Eq.3 shows                     t =as robust depth the estimated opti-
                                                             ·                                .      (2) camera movement, depth
                        Let[10] define Eq. p 2 dependency ofmap re- −tion opticalthat iteration scheme:
         ods. Finally, in theus itnonlinearhow the parallel projection. The model
                                                                                                     nonlinear close ap
                                                                                                    links
                                                                                                             where V is a
                        flow can be approximated as well as on the translation t
             cal flow on the depth map Z(r)              by the following projection on the            2. For fixed V, solve for
                        sensor plane: define Eq. 2 as the parallel projection. The optical       z    1. For have V → Z. for V:
                                                                                                             fixed Z, solve We solv
             perpendicular to theLet ussensor plane. In this nonlinear form, it is                           iteration scheme:
                                 flowthe projection in a variational framework. on the
                                        can be approximated by the following projection
2.    Optical flowto include plane: rz · r + and 3 we−find a linearized
             difficult
                                 sensor u =
             Nevertheless, combining Eqs.rz 2 r Z(r)tz
                                                            r Z(r)t
                                                                          r.                 (3)              1. For ∗ = argsolve f
                                                                                                                     Z fixed Z, min
                                                                                                             V = arg min ∑ Z (V
                                                                                                                ∗                    ∑
                                                                                                                                     1
                                                        +                                                                  V x∈D 2θ x
             relationship between the parallel projectionthe estimated opti-
                        Eq.3 shows the nonlinear dependencyr Z(r)t − r. optical
                                                                r + of and the
                                                      u = rz ·                                       (3) 8 can be V ∗ = argby th
                                                                                                     Eq.               solved
             flow: cal flow on the depth map Z(r) as zwell r Z(r)tz translation tz
                                                               r + as on the                         2. For fixed V, solve formin ∑  Z:
                                                                                                                                     V x
                                 Eq.3    tou =sensor plane. In  linearization
                        perpendicularshows thernonlinear . this nonlinear form, it is opti-
                                            the        Z(r)tp dependency of the estimated   (4)
                        difficult cal include the projection in a variational framework.
                                  to flow on the depth map Z(r) as well as on the translation t                   
                                                                                                                  θ r Z∇I∑tp
                                                                                                        z     2. ∗ λfixedminsolve f
                                                                                                                  Z = arg V, T |∇
                                                                                                                   For
                        Nevertheless, combiningthe sensor plane. Infind nonlinear form, it is
                           3. TV-L1 DEPTH FROM MOTION
                                 perpendicular to      Eqs. 2 and 3 we this a linearized                                            1
                                                                                                                                  x∈D
                                                                                                                      −λ θ ∗ r ∇I1 t   T
                        relationship between the parallel projection and the optical
                                 difficult to include the projection in a variational framework.      V = Z +be solved by the foll
             We assume for Nevertheless, combiningknow 2the camera trans- Eq. 8 can  ρ(Z) = arg mZ
                        flow: the moment that we Eqs. and 3 we find a linearized                                   
                                                                                                                            Z
                                                                                                                             T
             lation parameters t for two u = r the parallel projectionI1and the optical
                                                            Z(r)tp .
                                                   successive frames I0 and . Fur-
                                 relationship between                                        (4)                       r ∇I1 tp
                                                                                                               8 can be solved
             thermore, we assume flow: that the brightness does not change be-                                Eq. λ θPOLYTECHNIQUE by
                                                                                                              
                                                                                                               ÉCOLE           T
                                                                                                                        r ∇I1 tp
                                                                                                              FÉDÉRALE DE LAUSANNE
 
                                       Eq.3 shows the nonlinear dependency of the estimated opti-
                                          p
                                                                                                  T
                                       cal flow on the depth map Z(r) as well as onλ θ λr ∇I T∇I1z t
                                                                                     θ r 1 tp
                                                                                 translation t
                                                                                     the
          3.3. TV-L1 DEPTH FROM MOTION to the sensor plane. In this nonlinear form, it∇I T
                TV-L1 DEPTH FROM MOTION                                                        9

                                       perpendicular
                                                                      V=
                                       difficult to include the projection in
                                                                                         θ θ r T is
                                                                         VZa+Z + −λ−λr ∇I1 tp1
                                                                             =variational framework.
       Problem formulation
We assume for the moment that we know the camera trans- Eqs. 2 and 3  ρ(Z)ρ(Z)
 We assume for the moment that we know the camera trans-
                                       Nevertheless, combining                   aT linearized
                                                                                we find r ∇I T tp
 lation parameters t for two successive frames I0between .the
ation parameters t for two successive frames I0 and I1 . IFur- parallel projection r ∇I1the 1
                                       relationship and 1 Fur-
                                                                                            t
                                                                                      and p optical
hermore, we assume that the brightnessflow: not change be-be-
 thermore, Equation that the brightness does not change
        BC we assume                    does                  Projection Model
ween those images. Using the definition of of optical flow andr In order to solve Eq. Eq.the
 tween those images. T (x)u −definition optical flow and= Z(r)tp .
        I1 (x) + I1    Using the I0 (x)       0               u           In order to solve 9, (4)9,
he projection inin Eq. 4, we can express the image residual
  the projection Eq. 4, we can express the image residual             cancan be exploited. is giv
                                                                            be exploited. It It is
  ρ(Z) as in [6]: with respect to a known u
  (Z) as Linearization
         in [6]:                                                            p p ≤ With the introd
                                                       3. TV-L1 DEPTH FROM1}. 1}. With the int
                                                                                 ≤ MOTION
                                           0                                  be solved iteratively by
                                           We assume for the moment that solved iteratively by the
                                                                          be we know the camera trans-
        ρ(Z) == (x ++0 )0 ) + ∇I(T r r lationu0u− I− I0t. for two successive frames I0 tand I1 . Fur-
                                   T
          ρ(Z) I1 I1 (x u u + ∇I1 1 ( ZtZtp − ) 0 ) 0 .
                                             p−
                                                  parameters     (5)Data Term - bilinear in Z and p
                                                                     (5)
                                           thermore, we assume that the brightness does not change be-
                                                                                                 n+1 n+1pn +p
                                                  those images.fol-
                                                                 Using
                                                                                               p p= =
                                           tween solving thethe fol-the definition of optical flow 1 + 1   and τ
   depth map Z = Z(r) can be obtained by
AA depth map Z = Z(r) can be obtained by solving
         If we know the motion[2]: We cast the depth estimation in can express the image residual
owing optimization problem [2]:t:
  lowing optimization problem
                                           the projection in Eq. 4, we a TV-L1 optimization problem
                                           ρ(Z) as in [6]:                In the the discrete domain t
                                                                              In discrete domain the s
                                                                          lution depends on the imple
          Z ∗ == arg min ∑ |∇Z| + λ ∑ ρ(Z,, I0 , I1 ), (x + u0(6) ∇I1 ( r ZtpdependsI0 . the im
             Z ∗ arg min
                           ∑  |∇Z| + λ ∑ |ρ(Z, Iρ(Z)), I1 (6) ) + T lution − u0 ) − on (5)
                                                   0 I1 =|
                       Z Zx∈D
                            x∈D        x∈D∈D                                  erators. In 11, ∇ represe
                                                                          erators. In Eq.Eq. 11, ∇ rep
                                         x                                and the the scalar product
                                                                              and scalar product with
                                           A depth map Z = Z(r) can be obtained by solving the fol- w
where DD we have an domain of of pixels and their ego-motion by leastgence operator defined
          is the discrete estimate pixels and x x their position gence operator as as defin
  where If is the discrete domain of Z: lowingestimate position
                                              We optimization problem [2]: square
 n the image. The left term in in Eq. represents ∗ the regular- map cancan recovered by Z
  on the image. The left term Eq. 6 6 represents arg min |∇Z| +map ρ(Z,be, Irecovered b
                                                       the regular-                     be
zation term. Here we set it to to the TV norm Z = which ∑ depth ∑ positivity constra
                                   the TV norm of of which Zim-
                                                      Z Z
  ization term.arg minwe set 1it− I0 + I1 ( r Ztp − u0 ) x∈D
                                                                               λ positivity 1constraint
                                                                              depth
                                                                                              I0 ),       (6)
           t = Here                                                 im-
            ∗                               T                  2
                               I
         sparseness constraint on Z Z and acts edge-preserving. ered depth map, i.e. i.e.Z(r
                                                                              ered depth map, if if
                                                                                 x∈D
 oses a a sparseness constraint on and acts edge-preserving.
  poses                t
                      the data term which where is to the image to provide global convergen
                          x∈D
  he right term is is the data term which weDsetthe discrete domain ofto providexglobal conve
  The right term                            we set to the image                pixels and their position
                                           on the image. The left term in Eq. 6 represents depth map
 esidual as defined in Eq. 5. We have chosen the robust L1                 of detail in the the regular-
  residual as defined in Eq. 5. We have chosen the robust L1 ingof multi-scale resolution
                                                                                    detail in the depth m
 orm as it has some advantages when compared toHere we set it to theaTVanorm of Z which im-
                                           ization term. the usu-
                                                                              ing multi-scale resolu
  norm as it has some advantages when compared to the usu- on Z and acts edge-preserving.
                                           poses a sparseness constraint
 lly employed L2 norm [9]. Eq. 6 is not a strictly is the data term which we set to the imagek I
                                           The right term   convex        use downsampled images
                                                                                          ÉCOLE POLYTECHNIQUE
                                                                                          FÉDÉRALE DE LAUSANNE
image residual. This can be repeated until level 0 is reached
                                                                         and show how to combine the depth to t: best to includ
where weZ. Given two successive images I0we mustI1 we can recover thefrom motion esti-
           obtain the final depth map 0 Z.
                                                                                      the image residual with respect is
                                                                                              the∑ and0the rof p T ∇Iapproach:
         camera translation parameters by optimizing in Section2I31normego-motioneachrother, 1 = 0.10 (1
                                                                       mation described
                                                                                                    L − I + rely on estimation
                                                                       described in Section 4. Since both parts Zt              1        Z∇I
            4. EGO-MOTION ESTIMATION                                   it is very likely that wexcan combine them by performing al- the coars
                                                                        t:
         the image residual with respect to ternating depth and ego-motion estimation. We find that it
                                                                                                   ∈D                            1. At
Camera Ego-Motion Estimation
Let us assume now that we have an estimate of the depth map
Z. Given two successive images I0 and I1 we can recover the                                In the special case of in the movement L Z by th
                                                                       is best to include the alternation schemecamera multi-scale parallel to s
                                                                                                                                        and
                          ∑
the image residual with respect to t:
       ∗                                                        T
                                                                       approach: sensor plane solving Eq. 12 results in the linear syste
                                                                        1.       r Z∇I =c(x) level we initialize L t by as explained
                                        I1 − I0 + r Ztp T ∇I1 At the coarsest1resolution with L,(12)
camera translation parameters by optimizing the L2 norm of
                                                                                      A(x)b = 0.
                                                                                            2
      t = arg min ∈D I1 − I0 + I1 ( r Ztp − Zu0 ) small constant. We can first solve for L Z are zero
                                                                                 L by some
                                                                                                                                        zero
                                                                                                                                        ters
                       t x                                                  and
           ∑ I1 − I0 + r x∈D ∇I1 r Z∇I1 = 0. (12)
                              Ztp    T                                      as explained in Section 3. Since the ego-motion parame-
                                                                                                  
                                                                                                                              2 2. With the fla
                                                                                                                                                      
          x∈D
                In casespecial tcaseT of camera 2. Withare zero theparallel to2we2 estimate verymotion 2 Z 2 ∂∂Ix1 ∂∂Iy1 
                     the t = (t , , 0)
     Special case of camera movement parallel to the                     movement estimated∈D r map will be the xflat. r
                                                                            ters                          depth
                                                                                                   ∑x input Z ∂ x
                                                                                  the flat depth as
                                                                                                                  the∂ I1          ∑ ∈D
                                                                                                                                        parameters   a
                                                                                       A(x) = map
         sensor plane solving Eq. 12 results in the linear∈D r 4.Z 2 ∂ x ∂ y ∑x∈D r 2 Z 2 ∂ y
    In the special                        x y
                                                                                                      ∑x system k+1
                                                                            parameters according to Section 2        ∂ I1 ∂ I1                 ∂ I1
                                                                                                                                                    2

                                                                                                                                t3. Given the e
sensor plane solving Eq. 12 results in the linear system
A(x)b = c(x) with = c(x) with
         A(x)b                                                          3. Given the estimated motion parameters
                                                     linear system of equations k+1                                                 and the
                                                                                              Z, we first estimate the optical flow u0 =                k
                                                                                           and we compute the depth map at level depth map
                                                                            depth map
                                                                              r Z(r)tp , then                                           k Z.
                                                                  
                       2 2 ∂ I1
                          
                                       2
                                                        ∂ I1 ∂ I1
                                           ∑x∈D r 2 Z 2 ∂ x ∂ y        4. From the refined depth map k Z, we compute the motion Z(r)tp ,
                                                                                                                                         r
          ∑x∈D r Z ∂ x                                                                  k t.                  − ∑x∈D r Z ∂ Ix (I1 − I0 )
                                                                                                                                   1
                                                                 2  2 b = (tx , ty )
                                                                            parameters T
 A(x) =                                                                                           c(x) = I                     ∂
                                                                                               2 Z 2 ∂ I1 ∂until the ∈D r resolution is ) the re
                                                                                                             1 − ∑x finest Z ∂4. 1From
                                                                                                                                             .
                        2 Z 2 ∂ I1 ∂ I1          2Z 2 2∂ I1 ∂ I1
                                           ∑x∈Drr Z ∂ y                 5. Steps 3 andr are repeated
                                                                                                                                  I1
                                                                                                                                     (I − I0
                          ∂ x ∑x∈D                                         ∑x∈D
                                                  2                                        4
              ∑x∈D r               ∂y                                                                                           ∂y
                                                                ∂x          reached and the final depth map 0 Zobtained.
                                                                                                     ∂x ∂y
          A(x) =                                                                                     2 
                                                                                                                   is
                                                                                                                          parameters
                                                                                     For general camera motion, we can solve Eq. 12 by iter
   and                             ∂ I1 ∂ I1                                               2 ∂ I1
                   ∑x∈D , r )T Z 2 ∂ x ∂ y
    General case t = (t , t t
                              2                                                    r 2 Z 6. e.g. Levenberg-Marquardt or gradient descen
                                                                       ∑x∈D tive methods,RESULTSwhere x contains Steps translatio
                                                                                                ∂y
                                                                                 xn+1 = xn + γ∇E(xn )
                                                                                                                       5.
                                                                                                                           the three
                                                                                                                                     3 and
                             x1 y z
                 − ∑x∈D r Z ∂ Ix (I1 − I0 )                                                                               reached and
                                                                    In order to verify our approach, we use synthetic images of
                                                                    size 512 × 512 and ground truth depth maps of the image residual,
                                                                                 parameters and E is the energy generated by
          c(x) =            ∂                       .
         Gradiend−descent ∂∂Iy1 (I1 − I0 )
                   ∑x∈D r Z
                                                             ray-tracing of a 3D model of a living room. We have gen-
                and                                          erated multiple sequences∂for =
                                                                                         E various types of camera trans-∂ u
    For general camera motion, we can solve Eq. 12 by itera-
                                                                                       ∂ xi x∈D
                                                             lation, i.e. for movement parallel
                                                                                                ∑andI1perpendicular to the∂ xi .
                                                                                                                 T
                                                                                                        − I0 + ∇I1 u ∇I1   T
tive methods, e.g. Levenberg-Marquardt or gradient descent:
x n+1 = xn + γ∇E(xn ) where x contains the three translation image plane as well as for linear combinations of both. The
                                                               ∂ I1
parameters and E is the energy of the image− ∑x∈D r Z ∂ x (Iis to evaluatepartial derivatives of u with respect toto veri
                                             residual,       purpose 1 − I0 ) first ego-motion estimation In order the motio
                                                                               The                               and depth
                            c(x) =                                         seperately. .
                                                             from motion parameters are given by the Jacobian matrix
                                                               ∂ IWe run the ego-motion estimation with ground truth × 512
                                                                                                                size 512
                                          T − ∑x∈D r Z ∂ y (I1 − I0 )
                                                                  1
            ∂E                              ∂u
                 = ∑ I1 − I0 + ∇I1 u ∇I1
                                    T
                                                 .           depth maps on the different sequences and we ray-tracing of a
                                                                                                                obtain the
            ∂ xi x∈D                        ∂ xi                                                                                     
                                                             translation vector estimates as listed
                                                                                               rz r Zin Table 1. For sim-

    The partial For general camerato the motion we canwe only show the12 by itera- deviation of rthe Z
                derivatives of u with respect motion,        plicity solve Eq. mean and standard
                                                                                            rz + r Ztz         eratedr     0 multiple
                                                                                                                                      
                                                                                   
                                                                                                                              i.e. 
                                                                                                                         z
parameters are given by the Jacobian matrix                                  Ju =
                                                                              T
                                                             normalized vectors.                 0             lation,rryZtz r Zty for .
         tive methods, e.g. Levenberg-Marquardt or gradientfrom Zmotionr part by−r r Z +
                                                                  We evaluate the depth descent: x rx + Zt
                                                                                                                      rz +
                                                                                                                  using the
                                                                                       −rz r inputs. Zt )2 normalize the plane
                                                                                                                                      

         xn+1 = rxn + γ∇E(xn ) where x contains the threevectors asfor zcomparing the usedPOLYTECHNIQUE as
                                                                                        translationWe image(rz + r Ztz )2
                                                             ground truth translation             (r + r z         z
                                                           input images which is convenient                   ÉCOLE pa-
                  rz + r Ztz and E is the energy of rameters. In our residual,we use 5 levels of purpose is to ev
                    rz   Z
                                           0
         parameters                                          the image experiments,                             FÉDÉRALE DE LAUSANNE
                                                                                                                 resolution
T
     ρ(Z) = I1 (x + u0 ) + ∇I1 ( r Ztp − u0 ) − I0 .    (5)
                                                                                    pn + τ   11

                                                                             pn+1 =
depth map Z = Z(r) can be obtained by solving the fol-                              1 + τ∇
wing optimizationEstimation - TV-L1
     Depth problem [2]:
                                                               In the discrete domain the st
                                                               lution depends on the implem
       Z ∗ = arg min
                  Z x∈D
                       ∑ |∇Z| + λ ∑ |ρ(Z, I0 , I1 ),|   (6)
                                                               erators. In Eq. 11, ∇ represen
                                    x∈D
                                                               and the scalar product with
here D is the discrete domain of pixels and x their position   gence operator as defined in
  the image. The left term in Eq. 6 represents the regular-    map can be recovered by Z =
 tion term. Here we set it to the TV norm of Z which im-       depth positivity constraint h
 ses a sparseness constraint on Z and acts edge-preserving.    ered depth map, i.e. if Z(r)
 e right term is the data term which we set to the image       to provide global convergenc
 idual as defined in Eq. 5. We have chosen the robust L1        of detail in the depth map Z
 rm as it has some advantages when compared to the usu-        ing a multi-scale resolution
 y employed L2 norm [9]. Eq. 6 is not a strictly convex        use downsampled images k I0




                                                                            ÉCOLE POLYTECHNIQUE
                                                                            FÉDÉRALE DE LAUSANNE
T
     ρ(Z) = I1 (x + u0 ) + ∇I1 ( r Ztp − u0 ) − I0 .    (5)
                                                                                    pn + τ   11

                                                                             pn+1 =
depth map Z = Z(r) can be obtained by solving the fol-                              1 + τ∇
wing optimizationEstimation - TV-L1
     Depth problem [2]:
                                                               In the discrete domain the st
                                                               lution depends on the implem
       Z ∗ = arg min
                  Z x∈D
                       ∑ |∇Z| + λ ∑ |ρ(Z, I0 , I1 ),|   (6)
                                                               erators. In Eq. 11, ∇ represen
                                    x∈D
                                                               and the scalar product with
here D is the discrete domain of pixels and x their position   gence operator as defined in
  the image. The left term in Eq. 6 represents the regular-    map can be recovered by Z =
 tion term. Here we set it to the TV norm of Z which im-       depth positivity constraint h
 ses a sparseness constraint on Z and acts edge-preserving.    ered depth map, i.e. if Z(r)
 e right term is the data term which we set to the image       to provide global convergenc
 idual as defined in Eq. 5. We have chosen the robust L1        of detail in the depth map Z
 rm as it has some advantages when compared to the usu-        ing a multi-scale resolution
 y employed L2 norm [9]. Eq. 6 is not a strictly convex        use downsampled images k I0




                                                                            ÉCOLE POLYTECHNIQUE
                                                                            FÉDÉRALE DE LAUSANNE
T
     ρ(Z) = I1 (x + u0 ) + ∇I1 ( r Ztp − u0 ) − I0 .    (5)
                                                                                    pn + τ   11

                                                                             pn+1 =
depth map Z = Z(r) can be obtained by solving the fol-                              1 + τ∇
wing optimizationEstimation - TV-L1
     Depth problem [2]:
                                                               In the discrete domain the st
                                                               lution depends on the implem
       Z ∗ = arg min
                  Z x∈D
                       ∑ |∇Z| + λ ∑ |ρ(Z, I0 , I1 ),|   (6)
                                                               erators. In Eq. 11, ∇ represen
                                    x∈D
                                                               and the scalar product with
here D is the discrete domain of pixels and x their position   gence operator as defined in
  the image. The left term in Eq. 6 represents the regular-    map can be recovered by Z =
 tion term. Here we set it to the TV norm of Z which im-       depth positivity constraint h
 ses a sparseness constraint on Z and acts edge-preserving.    ered depth map, i.e. if Z(r)
 e right term is the data term which we set to the image       to provide global convergenc
 idual as defined in Eq. 5. We have chosen the robust L1        of detail in the depth map Z
 rm as it has some advantages when compared to the usu-        ing a multi-scale resolution
 y employed L2 norm [9]. Eq. 6 is not a strictly convex        use downsampled images k I0




                                                                            ÉCOLE POLYTECHNIQUE
                                                                            FÉDÉRALE DE LAUSANNE
T
      ρ(Z) = I1 (x + u0 ) + ∇I1 ( r Ztp − u0 ) − I0 .     (5)
                                                                                    pn + τ   11

                                                                             pn+1 =
depth map Z 2: side view withobtained by solving the fol-
      Figure = Z(r) can be the projections of camera motion                         1 + τ∇
wing optimizationEstimation - TV-L1
     Depth problem [2]:
                                                            In the discrete domain the st
 r-                                                        [2] lution depends on the implem
       optimization problem and thus hard toI solve. From(6) and
         Z ∗ = arg min ∑ |∇Z| + λ ∑ |ρ(Z, 0 , I1 ),|
he     [9] we knowZthat a convex relaxation can be formulated: erators. In Eq. 11, ∇ represen
                          x∈D       x∈D
or                                                              and the scalar product with
         Functional Splitting                                   gence operator as defined in
here D is the discrete domain of 1pixels and x their position
  the image. Themin ∑ |∇Z| + 6∑ (V − Z)2 the regular- )|, map can be recovered by Z =
         Z ∗ = arg left term in Eq. represents + λ ∑ |ρ(V
                     Z           2θ x∈D
 tion term. Here we x∈D it to the TV norm of Z which im-
                           set                        x∈D       depth positivity constraint h
2) a sparseness constraint on Z and acts edge-preserving. (7)
 ses                                                            ered depth map, i.e. if Z(r)
       where         a data term which we Z to for θ →
 e right termVisisthe close approximation ofset andthe image 0 we provide global convergenc
                                                                to
 idual have V → Z. We solve Eq. 7 using an alternative two-step detail in the depth map Z
al      as defined in Eq. 5. We have chosen the robust L1        of
 rm asiteration scheme:
he      it has some advantages when compared to the usu-        ing a multi-scale resolution
 y employed L2 normsolve for V: 6 is not a strictly convex
        1. For fixed Z, [9]. Eq.                                 use downsampled images k I0

3)                                1
              V ∗ = arg min
                         V
                               ∑ 2θ (V − Z)2 + λ ∑ |ρ(V )|.      (8)
                              x∈D               x∈D
 i-
 tz    2. For fixed V, solve for Z:
 is
k.                                      1
                  Z = arg min ∑ |∇Z| +
                   ∗
                                               ∑ (V − Z)2 .      (9)
ed                         Z x∈D       2θ     x∈D
al                                                                          ÉCOLE POLYTECHNIQUE
       Eq. 8 can be solved by the following soft-thresholding:              FÉDÉRALE DE LAUSANNE
central projection on the object plane parallel to the sensor
                                    T
     ρ(Z) = I1 (x +and)thus hard r Ztp − u0 ) −[2]. and (5)
 timization problem    u0 + ∇I1 ( to solve. From I0
                 plane is given by:
                                                                                          Z = arg min ∑ |∇Z| +
                                                                                                                1
                                                                                                                          − n+1 ∑
                                                                                                                                    pn + τ
                                                                                                                                         11

 ] we know that2: convex relaxation can be formulated:
                                                                                           ∗
                                                                                                                     ∑ (V pZ) + λ= |ρ(V )|,
                                                                                                                             2

        Figure a side view with the projections of camera motion
depth map Z = Z(r)tp = be obtained by solving (2) fol-
                   can z ·   r    r + r Z(r)t r       the
                                                                                                   Z x∈D       2θ   x∈D          x∈D
                                                                                                                                     1 +(7)τ∇
wing optimizationEstimation -
     Depth problem [2]:r Z(r) rz + r Z(r)tz TV-L1
                                           −        .
                                             r Z(r)       where V is a close approximation of Z and for θ → 0 we
                                                          have V → Z. In the discretean alternative two-step
                                                                       We solve Eq. 7 using domain the st
                                1
                          Let us define Eq. 2 as the parallel projection. The optical
                                                                                  iteration scheme:
  ∗
 r-      Z ∗ Z ∑ |∇Z| ∑2θ and λ ∑ + to , ),
Z = arg min arg min + |∇Z| + thusZ) ρ(Z, I∑I |ρ(VFrom(6) fixed lution depends on the implem
       optimization problem ∑ (V − hard λ solve. )|, 1. [2] and solve for V:
              =
                                                            2
                          flow can be approximated by the following projection on the
                          sensor plane:                    |                          For       Z,
                                                                      0 1 |
he     [9] we x∈D Zthat a convex rrelaxation can be formulated: erators. In Eq. 11, ∇ represen
                know x∈D x∈D + x∈D                 r Z(r)t
                                                                 x∈D
or                                   u = rz ·               − r.           (3) (7)        V ∗ = and the scalar + λ ∑ |ρ(V )|.
                                                                                                 arg min ∑
                                                                                                              1
                                                                                                                (V − Z)2 product with  (8)
                                              rz + r Z(r)tz
here D is the discrete domain of dependencyand x θ →opti- we
here  V Functional Splitting the nonlinear of Z and the estimated 0
         is a closeEq.3 shows
                      approximation pixels of for their position                                      V x∈D 2θ
                                                                                                gence operator as defined in x∈D

                                                1
 ve Vimage. We solve on the depth map Z(r) asrepresents the regular- fixed map can be recovered by Z =
  the →ZZ.= Themin ∑ |∇Z|sensor plane. alternative form, it is |ρ(V )|,
                    cal flow Eq. 7 using an well as on the translation tz
                                                                    2 two-step 2. For
                arg perpendicular to the + 6∑ this nonlinear + λ ∑
                                                                                                V, solve for Z:
            ∗        left term in Eq.                 In (V − Z)
 ration scheme: difficult to include the projection in a variational framework.
                     we x∈D combining 2θ x∈D
                      Z
 tion term. HereNevertheless,it to the TVand 3 we find aZ which im- Z∗depth positivity ∑ (V − Z)2. (9)h
                           set                 Eqs. 2
                                                       norm of linearized∈D x                     = arg min ∑ |∇Z| +
                                                                                                                        1
                                                                                                                            constraint
                                                                                                                       2θ x∈D
2)For fixed Z, solve for V: on Zparallel projection and the optical Eq. 8 can (7) by the following soft-thresholding:
                                                                                                         Z x∈D
 ses a sparsenessrelationship between the and acts edge-preserving. be solved depth map, i.e. if Z(r)
                     constraint                                                                 ered
       where
      Fixed Z:        a data term = r Z(r)tp .
                    flow:
                                          u which we
                                                                 Z to for θ →
 e right termVisisthe close approximation ofset andthe image 0we provide global convergenc
                                                                           (4)                  to
 idual have V → Z. We3.solve1 DEPTH FROM MOTION robust two-step r detail inρ(Z) <depth ∇I1T tp)2 Z
al      as defined in Eq. 5. We have chosen alternative L1  λof ∇I1T tTp if the −λ θ ( r map
                              1 TV-L Eq. 2 7 using an the                                   θ
                         advantages when ∑ the camera the
           has some ∑ 2θ the − Z) + λcompared to ⇒ usu-  ing ∇I tp if |ρ(Z)|> λ (( rr ∇I1Tttpp))2
         ∗
       iteration scheme:
 rm asVit = arg min assume for(V moment that we know |ρ(V )|. trans-(8)V = Z +  −λρ(Z) r a 1multi-scaleλθθresolution .
                                                                                                   θ             if ρ(Z)          T  2
he                 VWe                                                                                                    ≤     ∇I1
        1. For L2 lation∈Dsolve for two 6 is not a strictly convex
 y employed      fixed x parameters t Eq. successive∈D I0 and I1 . Fur-
                    normZ, [9]. for V:                  x frames
                                                                                                use downsampled images (10)0
                                                                                                r ∇I tT
                                                                                                      1 p                              kI
                          thermore, we assume that the brightness does not change be-
                                                                                In order to solve Eq. 9, the dual formulation of the TV norm
   For fixed V, solve for Z:
                          tween those images. Using the definition of optical flow and
                                                                                can be exploited. It is given by: TV (Z) = max{p · ∇Z :
                          the projection in Eq. 4, we can express the image residual
3)                        ρ(Z) as in [6]:  1                                     p ≤ 1}. With the introduced dual variable p, Eq. 9 can
                 ∗
           Z = arg min ∑= |∇Z| 2θ ∇I
                          ρ(Z)V I1
                                     ∑ + (V1 ∑ Ztp −0Z) I∑ (5)(9)be solved (8) bypnthe τ∇(∇ · pn −V /θ ) [3, 2]:
                                          0
                                                    −      2
            ∗ V = arg min (x + u ) +1 T ( r Z) − u λ−2 . |ρ(V )|.
                                                             +
                                                        (V ) x0∈D      .
                                                                                           iteratively
                                                                                                           +
                                                                                                             Chambolle algorithm


 i-
                        Z x∈D       x∈D      2θ obtained by solving the fol-
                   A depth map Z = Z(r) can be x∈D
                                                                                               pn+1 =
                                                                                                        1 + τ∇(∇ · pn −V /θ )
                                                                                                                                .          (11)

 tz 8 can be solvedlowingthe following soft-thresholding:
q.      2. For fixed V,optimization problem [2]:
                     by solve for Z:                                            In the discrete domain the stability and properties of the so-
                                                                                lution depends on the implementation of the differential op-
 is                         Z ∗ = arg min ∑ |∇Z| + λ ∑ ρ(Z, I0 , I1 ),      (6)
                                       Z x∈D                                    erators. In Eq. 11, ∇ represents the discrete gradient operator
k.
                                                      x∈D
                                                            1                                                 ∇ represents the
                                                                                2 the scalar product within [3]. From Eq. discrete diver-
                                                                                and

           λ θ r ∇Ithetimage. The left∑ in< −λ θ ( r∑ regular-)2 map can be recovered by Z = V − θ ∇ · p. Furthermore, the
                      ∗
                  where= is the discrete domain of pixels and x their (V − Z) gence operator as defined
                    Z T D arg min             |∇Z| +                   position   .        (9)                                    11, the depth
ed      
                  on 1 p           Z xterm
                                          ∈D              2θ x the T
                                    if ρ(Z) Eq. 6 represents∈D 1 tp ∇I
                   izationT
                          term. Here we set it to the TV norm of Z which im-    depth positivity constraint has to be imposed on the recov-
 =                    ∇I tp if ρ(Z) > and acts ∇I1 tp )2             T
al Z + −λ θ rposes a1sparseness constraint on Zλ θ ( redge-preserving. ered depth map, i.e. if Z(r) < 0 we set Z(r) ← 0. In order
                                                                                  .                                  ÉCOLE POLYTECHNIQUE
        Eq. 8 can be solved by the following soft-thresholding:                                                      FÉDÉRALE DE LAUSANNE
1
       ρ(Z) = I1 (x opticalthusThe optical to to Ztp asFrom [2]I0 . and = arg and x∑ |∇Z| solve. x∑ [2] and
 ization problemand + isprojection∇I T flow ris defined parallel0 )the optimization∗problemminthus hard to+ 2θ From(V − Z)2 + λ ∑ |ρ(V )|,
                           andugiven+hard (object plane − u to − sensor Z (5) Z
                          central                  on the
                                                                 solve.
                                         )thus hard u solve.theFrom [2]                           and
  timization problem and brightness pattern between two images. The [9] we know that ∗a convex relaxation can be formulated: n+1 x∈Dpn + τ
                          plane 0     flow. by:
                                                  r+   1r Z(r)t               r
                                                                                    appar-
                                                                                                                            ∈D
                                                                                                                                                1 ∈D
e know that a convex relaxation can be formulated:
                         ent motion rzof
                                                                                                               Z = arg min ∑ |∇Z| +                 ∑ (V − Z)2 + λ= |ρ(V )|,
                                                                                                                                                                        ∑         (7)
                                                                                                                                                                                            11

                                                                                                                                               2θ x∈D ofp and for∈D → 0 we
                         tp = projection on the object plane parallel to the sensor (2)
                                               ·                      −              .
 ] we know that2: plane is given by: withr thecan be Z(r) r of camera motion approximation Z
                        a convex relaxation projections
                         central
                                                                           r formulated: where V is a close
 depth map Z = side tp = be obtained by solving (2)arg minV → Z.+ We∑ (V − Z) + λ ∑ |ρ(V )|, alternative two-step
          Figure Z(r) can rz r + r Z(r)t
                                   rview rz +
                                       Z(r)                  Z(r)tz
                                                                                                                                                                                   τ∇
                                                                                                                           Z x∈D                                       x θ
                                                                                            . Z = havefol-
                                                                                                    the ∑ |∇Z| 1 solve Eq. 7 using an                                       1 +(7)
wing optimizationEstimation projection(2) the iteration scheme:In the discrete(7)Z alternative two-step
        Depthbe approximated byparallel − - . The optical whereVV→isZ.a closesolve Eq. 7 usingofan and for θ → 0 we
                                                r Z(r) rz + projection.TV-L1
                                                          ·                   −                                ∗                                                 2
                Let us define Eq. 2 as the                          r Z(r)tz        r Z(r)
                flow can        problem ·[2]:rfollowing r
                                   t 1=
                                             r     z r+
                                                       ther
                                                              Z(r)t
                                                                                          on                have
                                                                                                                         2θ
                                                                                                                       Z x∈D
                                                                                                                               We
                                                                                                                                       approximation
                                                                                                                                                  x∈D                     x∈D
                                                                                                                                                               domain the st
Z = min ∑ argLet + plane: |∇Z| + thusZ)|ρ(Z, opticalI |ρ(VFrom(6) fixed 7 using anfor V:
=∗arg arg min |∇Z|us can+ Eq.∑ the parallel the following projectionhave V)|, We solve lution depends on the implem
                                            1 (V −byparallel2 λ ∑ |ρ(Von the→isZ.For fixed Eq. solve for alternative two-step
                          Let us define Eq. 2r as theZ(r)t 2 rprojection. The where V 1. aiteration scheme: of Z V: for θ → 0 we
                                        p
                                           r Z(r)      +                    Z(r)              optical                  Z,
                          |∇Z| 2θ approximatedλZ) + + λ solve.)|,
                                                                   z           z                             close approximation           and
                sensor plane:
 r-       Z x∈D∑ flow can be approximated by the following projection on the iteration
        optimization problemas∑ (V − hardThe I∑ ), scheme:[2] and solve
                          flow         be
                          min ∑2θ2 r and Z(r)t                    projection. to

he
          Z Z∗
                =         sensor
                                 define
                                           x∈D+ r
                     knowZthat a z · rx+ r Z(r)t∈Dr.
                                     u              ∈D            ∑                   0
                                                                                    x∈D (3)
                                                                                               |
        [9] we x∈Dsensor plane:= rconvex rrelaxation can be formulated:= argerators. (V −Eq.+11, ∇ represen
                                                                                                             1. For
                                                                               x∈D , 1 1. For fixed Z, solve for V:
                                                                                                                             Z,
                                                                                                                                            1 In
                                   x∈D
                                                                      −
                                                              + xz Z(r)t
                                                                  r                                (7)
                                                                                                   (3) (7)
                                                                                                                 V∗           min ∑                   Z)2 λ ∑ |ρ(V )|.            (8)
e V is a closeshowsapproximationrzof rand zfor θ (3) opti- 0 Vwearg min ∑ 2θ (V −min+∑ ∑ scalar(8) λxx∑ |ρ(V )|. (8) V 1 and the (V − Z) + ∈D
or                                                                                                                                             1             2 product with
                                                  z= r ·
                                                 u r + r Z(r)t                − r.                                     ∗        V x∈D 2θ
                                                        z                                                                = arg
                Eq.3 approximation + r Z(r)tZZ(r)t estimated → we =
           is a closeEq.3 depth
                                            u = · of +
                              the nonlinearr dependency of r.
                                                           z   Z − the for θ 0
                                                                        and translation t→
here D is the flow on theshows the nonlinear dependencyand x theiropti- 2. For fixed V, solveVfor∈Doperator as∈Ddefined in
       V Functional Splitting map Z(r) as well as on the the estimated position
                                                                                                                   ∗
                                                                                                                                    Z) x λ 2θ   |ρ(V )|.    2
                                                    r
here Z. We solve showsdomaindependency of the estimated opti- z
                cal discrete               nonlinear of pixels
                                                                   z           z
V → → Z.perpendicularEq.thethe using an an well as on the translation tz 2. For fixed V, solve for Z:                        genceZ:
                                                                                                                                  V x∈D

 ve Vimage. We solve the 7 depth Z(r) as1 6asrepresents the For fixed V, solve for Z: map can be recovered by Z =
                                                                  alternative two-step
                                                                             of                                                                                   x∈D

                         cal flow on the sensor using this nonlinear form, 2.
                                     Eq. 7in Eq.well alternative two-step
                         Eq.3

 on scheme: Themin ∑to |∇Z|sensorinInathis nonlinearnonlinear form, itregular- Z ∗ = arg min |∇Z| + 11 (V − Z)2 . (9)
                             left term the plane. ∑ (V framework. ∑
                          cal flow on depth map map Z(r) as on the translation t2 it is
                                 to                 plane. In
   the Z difficult perpendicular tosensor+ plane. In this − Z) + λ is |ρ(V )|,
              ∗
                 = arg perpendicular projection                                                            z

  ration scheme: to include thethethe projection in a variationalform, it is                                                depthx∑ ∑−|∇Z|. + 2θx∑constraint
                                                                                                                               = + Z ∑(V Z) 2θ(9) ∑
  tion term.Nevertheless, combining to Eqs.22θ x∈Da findlinearizedZ whichZ im-min ∑Z|∇Z| arg1minpositivity∈D (V − Z)2. (9)h
                             Z
                   Herefor V: includeEqs. projectionwenorm find aframework.∈D = arg
                            we x∈D it thetheand variational awea linearized x
                          difficult to
                                     set
                         difficult to include
                                                                TVwe variational linearized
                                                                   3 in
orFor fixedrelationship between the the parallelprojectionand the the optical
     fixed Z, solve relationship between parallel projection and optical
                         Nevertheless, combining
                          Nevertheless, combining 2 and 3 and 3
                                                             Eqs. 2 find
                                                                                     of
                                                                              framework.                               ∗    ∗
                                                                                                                                      2θ Z ∈D
                                                                                                                                                                                2

                 Z, solve for V: on Zparallel projection and the Eq. 8 can be solved by the following soft-thresholding: soft-thresholding:
                                                                                                                                    Z x∈D

                                                                                              opticalEq. 8 can be (7)       ered the x∈D map,∈D if Z(r)
                                                                                                                                         depth                     i.e.
                                                                                                                                                                x∈D
2) a sparsenessrelationship between the and acts edge-preserving.cansolved by by the following soft-thresholding:
 ses                        constraint
                                                                                                                                                                 x

                flow: flow:                                                                                   Eq. 8       be solved            following
        where                                    =u = u which we Z and (4)θ →
 e right termVisisthe closeuapproximation ofset(4) (4) forimage 0tweif ρ(Z) < −λ θ ( r ∇I t ) convergenc
                              a data termr = rpZ(r)tp .
                          flow:
       Fixed Z:                                             Z(r)t .
                                                     r Z(r)t .             p
                                                                                          to the  λ θ r                   to provide global
  idual have V → Z. TV-L3.solve1 DEPTH FROM MOTION V robust two-stepif ρ(Z) t> λTp ( rifinρ(Z)<<−λ θθ(( rr ∇I1T tp2))22Z
                             in We 1 5.DEPTH FROM MOTION an alternative L1  ofr 1T T1pt θ if ) −λ
                                   13. TV-L We 7 MOTIONEq. using                                                  ∇I                T                                              T    2

      ∗ as defined 3. Eq. DEPTH 2 have chosen the = Z +  −λ θ r ∇Iλtθ λr ∇Idetail ∇I tthe .depth ∇I1 tp                                                                    map
                                                                                                                                   1 p                                           1 p
                                                                                                                                                   ρ(Z)
        = = min We ∑ (V − − Z)we + ∑∑
                          ∑ parameters moment we knowcompared the Z =   θing ∇I λ θ ( if if∇Iρ(Z)(10)λλ ∇IT
                                                                                                                    θ ∇I T                                                 T
al V V ∗argarg minassume for1theTV-LZ)that+ λ λthe camera trans- )|. (8)
                                                       1                                                                              T                                         T    2
                                                          FROM 2 know |ρ(V )|.                                                       1 p                                       1 p

 rm asiteration scheme:2θ t(Vtwo successive framesknow |ρ(Vtrans-trans-V =V + +−λ−λρ(Z)∇Ia≤t1multi-scaleθθ(resolution .
                                                                                                        (8) Z                                                              Tt 2
         it has V xWefor2θ momentmoment thatx∈DI andthe Fur- to ⇒ usu-  ρ(Z) r r 1 ptp if if |ρ(Z)|≤≤λλθθ((rrr∇I11T tpp)))22                    r ρ(Z) ) >
                                                                                                                           ρ(Z)
                                                                                                                              if θ                                               T    2
                     some assume for for thatwhenwe the camera
                         lation advantages
                                                                                                                                                         t
                                                                                                                                |ρ(Z)|                        >
he
                                                                                                                           T
                                                                                                                       r ∇I1 tp                                                  1 p
                                                                                   I . camera
                We assumeV ∈D ∈D assume
                                   the          the                                        0           1

         1. For Lparametersweimages. Using the definitiondoesoptical a strictly convex 9, the ∇Iruset downsampled ( r ∇I1 p (10)0
                lationfixed xthosesolve that thetwo 6 isofnot0 andIand. Fur-. order to solve Eq. r dual∇I
                          lation parameters for V: successive∈D flow 0 1and IIn Fur-
                               Z, t [9]. t Eq.
                         thermore,
                           norm for two successive frames
                                                    for
                                                           brightness      not change be-
                                                                          x frames I
 y employed 2 tween we assume that the brightnessIdoes not change be-exploited. It is given by: TV (Z) = max{p · ∇Z : images (10)
                                                                                              1
                                                                                                                              T                    |ρ(Z)|
                                                                                                                                                    T
                                                                                                                             1 tformulation of the TV norm
                                                                                                                                  p                 1 p
                                                                                                                                                                         ∇I
                                                                                                                                                                                 kI
                thermore,thermore, inthat the brightness does not residual be- be 1}. Withorder to solvedual variabledual formulation of the TV norm
                         the projection Eq. 4, we can express the image change can
                           we assume
orForFixed V:V, solve as in [6]: Using the definition of optical flow flowsolved iteratively byexploited. 9,It9,isthe p, [3, 2]:9 TV (Z) = max{p · ∇Z :
     fixed V, tween those for
        fixed solve forprojection in Eq. 4, we can express the image residualIn In the solve Eq.It is algorithmby: TV (Z) = max{p · ∇Z :
                          tweenZ: Z:
                         ρ(Z)images. images. Using the definition of optical and and order tointroduced Eq. the dual formulation of the TV norm
                                   those
                                                                                              be
                                                                                                p ≤
                                                                                                            can exploited.
                                                                                                                  be the Chambolle given by: given
                                                                                                                                                    Eq. can
                          the in
                the projection ρ(Z)Eq. (x +we) can express − u )image residual
                                             4,                         the − I . (5)                  can be
                                                            1
                                                                       T
                          ρ(Z) as in= I  [6]: u + ∇I ( r Zt                                                   p ≤ 1}. With the introduced dual variable p, Eq. 9 can
3)                                                                                                       p be solved + τ∇(∇ · p −V /θ ) .Chambolle algorithm [3, 2]: can
                                                                                                                                                             variable p, Eq. 9
                                              1                                                    0
                                                                       1
                                                                                                             ≤ 1}. pWith the introduced dual(11)
                                                                                                                                      n                     n

                                                                (V by Z) the fol- ∑ be )|. 1 +Chambolle the
                     V A= arg min ∑ be obtained− solving + λ 2 |ρ(Vsolved=iteratively· pby−V /θChambolle algorithm [3, 2]:
                                                               0                   p   0
                ρ(Z) as∗in [6]:                                                 2
                            depth map Z = Z(r) can 1 1
                                                                                                                            n+1
                                                                                                             p
                                                                                                                       (8)
                                                                                                                        iteratively by )
             Z = min=∑ ∑= |∇Z| +(+ ∑∑ u−− I0 .) − I .. In ⇒
                                                   + T[2]: 1T r (V − 2 . (5) (9)
                                                                                                                          τ∇(∇
        Z ∗ =∗ argarg min (x|∇Z| ++ u2θ r∇IZt( (VZtp)− u0Z)x0∈Dthe discrete domain the stability andthenprojection nso- /θ )
                                                                                                                                                            n
                                                                                                                                                             algorithm
                                    ρ(Z)V I (x
                         lowing optimization1problem 0 )
                       ρ(Z) Z I1 + u0x∈D 2θ2θ p − 0
                                                 ) ∇I1
                                                                                  Z)               (5) (9)
                                                                                                                                          p + τ∇(∇
                                                                                                                                          properties of · p −V
                                                                                                                                                         the
                          Z x∈D                                                                                               pn+1 =of the differential op-
                                                                                              lution depends on the implementation n + τ∇(∇ · pnn−V /θ ) .
                                                                                                                                      p 1 + τ∇(∇ · p −V /θ )                    (11)
 i-                       A depth x∈DZ = ∑ |∇Z| + λx∈D∈D , I ), solving the fol-In Eq. 11, ∇ represents the discrete gradient operator
                                         ∗
                                       = arg
                                                       Z x∈D
                                                                  ∑x I
                                    Z map min Z(r) can be obtained byρ(Z,              0(6)    1
                                                                                              erators.                  p  n+1
                                                                                                                                  =
                                                                                                                                            τ∇(∇ · p
                                                                                                                                       1 +the discrete n −V
                                                                                                                                                                     .          (11)
                A depth map Z = Z(r) can problem [2]: by solving the fol- the scalar product with ∇ representsthe stabilitydiver- /θ )
                          lowing optimization be obtained                  x∈D

q.z 8 can solved by by the∗discretefor[2]:of pixels and x their position gencecanIn the as definedZin [3].− θ ∇theEq. 11, the and ofproperties of the op-
         2. For fixedTV-L2the following soft-thresholding: operator the depends domainimplementation the differential so-
                 solved theisfollowingZ:
                         whereV, solve domain
 tcan be be lowing optimization problem scheme                                                and           In      discrete                                and           of the
                                D      denoising soft-thresholding:                                                                  From                  depth
                                                                                                                discrete domain · p. stability the
                                                                                                            lution
                         on the image. Thearg min ∑Eq. 6 represents ρ(Z, I0 , I1 ), map (6) be recovered by = V
                                      Z = left term in |∇Z| + λ ∑ the regular-
                                                                                                                                 on the Furthermore, properties                  so-
 is                                                                                           depth positivity depends on to
                                                                                                       lution                    the ∇imposed on the recov-
                                                                                                            erators. In Eq. 11,beimplementation of thegradient operator
                                                                                                                                        represents the discrete differential op-
                          Z ∗ = sparseness∑ |∇Z|to the and acts xofI0 , I1 ),
                                  arg min constraint + λ ∑ ρ(Z, 1
                                                    Z
                         ization term. Here we set itx∈D TV norm ∈DZ which im-
                                                                                             (6) depth map, constraint has product Z(r) ← 0.represents the discrete diver-
                                                                                              ered erators.i.e. if scalar 0 we set with ∇ In order gradient operator
                                                                                                            and In Eq. 11, ∇ represents the discrete
                                                                                                                  the Z(r) <
k.                      where= is is the data term which we set to the x ∑ positionand gence operator as defined different levels Eq. 11, the depth
                           Z ∗ D arg min ∑ |∇Z| +
                         poses a                           on Z          edge-preserving.                   2
                         The right termthexdiscrete domain of pixels and image (Vprovide global convergence and to with ∇ represents the discrete diver-
                                         Z     ∈D               x∈D
                                                                                       their to − Z) the scalar product handle in [3]. From
                                                                                                              .        (9)
                θ rD Tonthe discrete domain term in<chosen (θtheirr Lthe tT detailgence operator as defined solving V that us- Eq. 11, the depth
                         residual as defined inif left We −λ 6 the robust ∇Iof t 2 2 map can be we propose
                               T                  Z x of<
                                                       5.
                                                               pixels  −λx tor ∇I T 1p a multi-scale resolution approach. byin [3]. From
              λr ∇I1istp timage. TheEq.ρ(Z)haveEq. θrepresents∈D regular-)in the depth map Zrecovered ThisZmeans − 7 ∇ · p. Furthermore, the
                          ∇Ithe itp someρ(Z)∈D compared 2θ x 1 ing )p
ed  θ where norm1as has if advantages when and
         λ                                                                       (the usu-
                                                                                       position
                                                                                                           1                                     = Eq. θ
                                                                                                                                                              we
                          izationT L Here we set it toisthe TV normconvex which im-mapdepthbe recovered by different to be imposed on the recov-
                                    term.                                            of TZ                          positivity constraint has
                         allyT The left norm [9]. Eq. 6 > λaθ ( r ∇I T t 2 downsampled images I and I of L Z = V − θ ∇ · p. POLYTECHNIQUE the
                                                            Eq. 6 represents the regular- )2
al Z + −λ−λthe image.ta1sparseness constraint on θand r ∇I1 tp )p
 =              on r rposes tpif term ρ(Z) λ Z ( acts
                     θ        ∇Ip
   + Eq. θ can∇I1 solved setif the following Z edge-preserving.depth.positivity constraint has to be FÉDÉRALE DEon the recov-
                              employed         2
                                                 ρ(Z) >
                                                      in             not strictly             use            can depth map, i.e. if Z(r) < 0 we(the Z(r) ← 0. In order
                                                                                                          . ered
                                                                                                                                          k
                                                                                                                                              0      sizesÉCOLE Furthermore,
                                                                                                                                                        k
                                                                                                                                                            1
                                                                                                                                                              set
               8ization term. Here we byit to the TV norm of soft-thresholding:
                         be                                                         which1im-                                                               imposed LAUSANNE
12


Joint Depth and Ego-motion Estimation




                                   ÉCOLE POLYTECHNIQUE
                                   FÉDÉRALE DE LAUSANNE
13


Joint Depth and Ego-motion Estimation
Multiple images / video sequence
 propagate solution
 initialize motion parameters with zero




                                           ÉCOLE POLYTECHNIQUE
                                           FÉDÉRALE DE LAUSANNE
14


GPU Implementation
Image processing on the GPU:
                                      ‣   Host : I/O + render calls


                                      ‣   Device: parallel processing per pixel by
                                          fragment shading kernels




• Fixed number of iterations and parameters
• Non optimized code
• We reach a performance of 5 fps


                                                                      ÉCOLE POLYTECHNIQUE
                                                                      FÉDÉRALE DE LAUSANNE
15


Results
(the synthetic room sequence – Camera moving left)




                                                     ÉCOLE POLYTECHNIQUE
                                                     FÉDÉRALE DE LAUSANNE
16




Results
                              Depth from motion

  GT                                   GT                                GT




       movement in x           movement in y        movement in z               x+y+z




   MSE:        4.5 · 10−4   MSE:      2.7 · 10−4   MSE:     3.1 · 10−4   MSE:           3 · 10−4




                                                                                               ÉCOLE POLYTECHNIQUE
                                                                                               FÉDÉRALE DE LAUSANNE
17




Results
         Joint ego-motion and depth estimation

                                GT                                       GT




   movement in x           movement in x             movement in z              movement in z




  MSE:      6.2 · 10−4   MSE:        10.1 · 10−4   MSE:     4.1 · 10−4        MSE:      3.9 · 10−4




                                                                                                ÉCOLE POLYTECHNIQUE
                                                                                                FÉDÉRALE DE LAUSANNE
18




Conclusions and Future Perspectives
   We propose a new SfM algorithm based on TV-L1 model
    -   Joint estimation of motion parameter and dense depth map
    -   Fast and high parallelizable
    -   (almost) real time implementation (5fps) based on GLSL



   Limitations
    -   no camera rotation
    -   Tested only on synthetic images


   Future work
    -   Include rotation
    -   Include a robust tracker for the ego-motion




                                                                   ÉCOLE POLYTECHNIQUE
                                                                   FÉDÉRALE DE LAUSANNE
19




Thank you for your attention

         Questions?




                               ÉCOLE POLYTECHNIQUE
                               FÉDÉRALE DE LAUSANNE

More Related Content

PDF
Accelarating Optical Quadrature Microscopy Using GPUs
PDF
ICPR 2012
PPTX
Monocular simultaneous localization and generalized object mapping with undel...
PDF
Modern features-part-1-detectors
PPT
JavaYDL13
PPTX
Concept of stereo vision based virtual touch
PDF
Lecture 9h
PDF
Stereo vision
Accelarating Optical Quadrature Microscopy Using GPUs
ICPR 2012
Monocular simultaneous localization and generalized object mapping with undel...
Modern features-part-1-detectors
JavaYDL13
Concept of stereo vision based virtual touch
Lecture 9h
Stereo vision

What's hot (20)

PDF
Image Denoising Using Non Linear Filter
PDF
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
PPTX
Kccsi 2012 a real-time robust object tracking-v2
PDF
Color Img at Prisma Network meeting 2009
PDF
Estimating Human Pose from Occluded Images (ACCV 2009)
PPTX
Compressive Light Field Displays
PDF
Rear View Virtual Image Displays
PDF
suryaiitm.pdf
PPTX
IGARSS-MI-Pritt.pptx
PDF
Fcv learn le_cun
PDF
Time Machine session @ ICME 2012 - DTW's New Youth
PPTX
Btp viewmorph
PDF
Putting Objects in Perspective
PDF
Within the Resolution Cell_Super-resolution in Tomographic SAR Imaging.pdf
PDF
Modern features-part-0-intro
PDF
Multimodal pattern matching algorithms and applications
PPTX
SIGGRAPH 2012 Computational Display Course - 3 Computational Light Field Disp...
PDF
Viva3D Stereo Vision user manual en 2016-06
PDF
Répétition soutenance
PPTX
Facial Expression Recognition / Removal
Image Denoising Using Non Linear Filter
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
Kccsi 2012 a real-time robust object tracking-v2
Color Img at Prisma Network meeting 2009
Estimating Human Pose from Occluded Images (ACCV 2009)
Compressive Light Field Displays
Rear View Virtual Image Displays
suryaiitm.pdf
IGARSS-MI-Pritt.pptx
Fcv learn le_cun
Time Machine session @ ICME 2012 - DTW's New Youth
Btp viewmorph
Putting Objects in Perspective
Within the Resolution Cell_Super-resolution in Tomographic SAR Imaging.pdf
Modern features-part-0-intro
Multimodal pattern matching algorithms and applications
SIGGRAPH 2012 Computational Display Course - 3 Computational Light Field Disp...
Viva3D Stereo Vision user manual en 2016-06
Répétition soutenance
Facial Expression Recognition / Removal
Ad

Similar to Fast Structure From Motion in Planar Image Sequences (20)

PDF
Lecture23
PDF
3-D Visual Reconstruction: A System Perspective
PPT
CORNAR: Looking Around Corners using Trillion FPS Imaging
PDF
Battle field3 ssao
PDF
Visual Odomtery(2)
PPT
CS 354 Acceleration Structures
PDF
Lecture22
PDF
ETHZ CV2012: Tutorial openCV
PDF
2008 brokerage 03 scalable 3 d models [compatibility mode]
PDF
C g.2010 supply
PDF
Final report
PDF
An Approach for Estimating the Fundamental Matrix by Barragan
PDF
Camera Calibration from a Single Image based on Coupled Line Cameras and Rect...
PDF
Camera parameters
PDF
Build Your Own 3D Scanner: Course Notes
PDF
C O M P U T E R G R A P H I C S J N T U M O D E L P A P E R{Www
PDF
Computer Graphics Jntu Model Paper{Www.Studentyogi.Com}
PPTX
Computer Vision-UNIT1-2025-PART abcB.pptx
PDF
Structlight
PDF
Lecture13
Lecture23
3-D Visual Reconstruction: A System Perspective
CORNAR: Looking Around Corners using Trillion FPS Imaging
Battle field3 ssao
Visual Odomtery(2)
CS 354 Acceleration Structures
Lecture22
ETHZ CV2012: Tutorial openCV
2008 brokerage 03 scalable 3 d models [compatibility mode]
C g.2010 supply
Final report
An Approach for Estimating the Fundamental Matrix by Barragan
Camera Calibration from a Single Image based on Coupled Line Cameras and Rect...
Camera parameters
Build Your Own 3D Scanner: Course Notes
C O M P U T E R G R A P H I C S J N T U M O D E L P A P E R{Www
Computer Graphics Jntu Model Paper{Www.Studentyogi.Com}
Computer Vision-UNIT1-2025-PART abcB.pptx
Structlight
Lecture13
Ad

Fast Structure From Motion in Planar Image Sequences

  • 1. Fast Structure from Motion for Planar Image Sequences Andreas Weishaupt, Luigi Bagnato, Pierre Vandergheynst ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 2. 2 Depth Map ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 3. 3 Motion Depth Map ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 4. 4 Motion Depth Map ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 5. 5 Motion Depth Map ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 6. 6 Motivations  Cinema 3D Philips 2D+Depth  Autonomous Navigation (SLAM)  3D scanning/modeling ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 7. 6 Motivations  Cinema 3D Philips 2D+Depth  Autonomous Navigation (SLAM)  3D scanning/modeling Target: - Real time performances - Good Accuracy ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 8. 7 Problem Formulation We consider only 2 consecutive frames I0 and I1 Brightness Consistency Equation (BCE) I1 (x + u) − I0 (x) = 0 f: focal t: camera translation d: distance/depth Ω: camera rotation u: optical flow ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 9. 7 Problem Formulation We consider only 2 consecutive frames I0 and I1 Brightness Consistency Equation (BCE) I1 (x + u) − I0 (x) = 0 f: focal t: camera translation d: distance/depth Ω: camera rotation u: optical flow ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 10. 7 Problem Formulation We consider only 2 consecutive frames I0 and I1 Brightness Consistency Equation (BCE) u I1 (x + u) − I0 (x) = 0 f: focal t: camera translation d: distance/depth Ω: camera rotation u: optical flow ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 11. 7 Problem Formulation We consider only 2 consecutive frames I0 and I1 Brightness Consistency Equation (BCE) u I1 (x + u) − I0 (x) = 0 If we assume that the motion between frames is small f: focal t: camera translation Linearization d: distance/depth Ω: camera rotation T u: optical flow I1 (x) + I1 (x)u − I0 (x) 0 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 12. They rely on finding pairs of corresponding points in succes- 2. MOTION IN PLANAR IMAGE siveand optical flow. The optical flow u is defined as the appar- images. This has the following consequences: optimization problem and 8 • ent motion ofdepends on thepattern of the found cor- images. The The final result brightness quality between two We model the camera movement during acqui [9] we know that a convex successive frames by the rigid 3D translation respondence. If the match is not exact the reconstruction central be accurate. on the object plane parallel to the,tsensor Consequently, during camera mo projection Problem Formulation - Projection Model will not • plane iscorrespondences is a computationnally expen- Finding given by: (tx y ,tz )T . p = (X,Y, Z)T becomes p = p − ∆p = p − t sive task: dense reconstruction cannot be performed in Z camera min real-time. Figure 1: pinhole camera model and rigid camera= d(r)er where r = (x, y, sideZview with the p p motion ∑ denotes the relative ∗ = argmotion. We param |∇Z| + Figure 2: f ) is a point on the x∈D rz r + r Z(r)t to be lim- • For real-time reconstruction, the recovery has − r f the focal distance and d(r) the distance or dep tp = · . (2) ited to some few feature points.+Often, tracking ofrthe rigid camera motion in the scene. close view w Z(r) pinhole camerazmodel and tical center to a point V is a2: side appro rFigure 1: rz r Z(r)t Z(r) The pinhole c where Figure 1. We denote Z Figure found feature optical is employed to reduce additionnal as themotion is shown in map = inverse of d thu and points flow. The optical flow u is defined and appar- Z: depth optimization problem and r of brightness patternprojection. The opticaldepthhave V → that We solve E the inverse Let us define Eq. 2 as the parallelsecond source images. obtained [9] central projection: computation cost. Tracking introduces a between two ent motion The orwe know Z. Aapoint on rela depth map. convex the of error for 3D reconstruction. the object plane parallel tocan be iteration scheme: by flow cancentral projection onby the following projection on the appar- optimization problem be approximatedflow. The optical flow u is defined as the and optical the sensor Another class is ent motion of to obtain dense depth plane given by: sensor plane:of recent methodsbrightnessmaps by between two images. The∗= r fixed r Z(r)p. afor pattern 1. For pwe know that con maps is based on thecentral projection depth object plane parallel to the sensor = arg min ∑ |∇Z| + [9] Z, solve 1 fusion of sparse on the im- Z d(r) = r age registration techniques, is given by: 2θ plane e.g. r[8]. Those have the advan- r Z x∈D zr + r + systems can be tage that traditional structure from motion Z(r)t r Z(r)t tu = rz · p = − − a motion, (3) 2 we projections= arg min ∑ |∇Z rtoZ(r) rr + r Z(r)tzr. r Z(r) · . In Figure (2) have a side view of the came employed. However, in order rz + z accurate results, provide Z(r)tz where V V a = argZ as well as is close min ∑ Z ∗∗ on the sensor plan approxim Vx x large number of depth maps has to be input forrsuchrmeth- rz + Z(r)t parellel object plane. BasedZ. Eq. 1, we ∈D ∈D7 r have V → on We solve can de Eq. 1. Parallel projectionis shown r Z(r) rz + r Z(r)tz r Z(r) Eq.3 shows t =as robust depth the estimated opti- · . (2) camera movement, depth Let[10] define Eq. p 2 dependency ofmap re- −tion opticalthat iteration scheme: ods. Finally, in theus itnonlinearhow the parallel projection. The model  nonlinear close ap links where V is a flow can be approximated as well as on the translation t cal flow on the depth map Z(r) by the following projection on the 2. For fixed V, solve for sensor plane: define Eq. 2 as the parallel projection. The optical z 1. For have V → Z. for V: fixed Z, solve We solv perpendicular to theLet ussensor plane. In this nonlinear form, it is iteration scheme: flowthe projection in a variational framework. on the can be approximated by the following projection 2. Optical flowto include plane: rz · r + and 3 we−find a linearized difficult sensor u = Nevertheless, combining Eqs.rz 2 r Z(r)tz r Z(r)t r. (3) 1. For ∗ = argsolve f Z fixed Z, min V = arg min ∑ Z (V ∗ ∑ 1 + V x∈D 2θ x relationship between the parallel projectionthe estimated opti- Eq.3 shows the nonlinear dependencyr Z(r)t − r. optical r + of and the u = rz · (3) 8 can be V ∗ = argby th Eq. solved flow: cal flow on the depth map Z(r) as zwell r Z(r)tz translation tz r + as on the 2. For fixed V, solve formin ∑ Z: V x Eq.3 tou =sensor plane. In  linearization perpendicularshows thernonlinear . this nonlinear form, it is opti- the Z(r)tp dependency of the estimated (4) difficult cal include the projection in a variational framework. to flow on the depth map Z(r) as well as on the translation t   θ r Z∇I∑tp z 2. ∗ λfixedminsolve f Z = arg V, T |∇ For Nevertheless, combiningthe sensor plane. Infind nonlinear form, it is 3. TV-L1 DEPTH FROM MOTION perpendicular to Eqs. 2 and 3 we this a linearized 1 x∈D −λ θ ∗ r ∇I1 t T relationship between the parallel projection and the optical difficult to include the projection in a variational framework. V = Z +be solved by the foll We assume for Nevertheless, combiningknow 2the camera trans- Eq. 8 can  ρ(Z) = arg mZ flow: the moment that we Eqs. and 3 we find a linearized  Z T lation parameters t for two u = r the parallel projectionI1and the optical Z(r)tp . successive frames I0 and . Fur- relationship between (4) r ∇I1 tp  8 can be solved thermore, we assume flow: that the brightness does not change be- Eq. λ θPOLYTECHNIQUE by  ÉCOLE T r ∇I1 tp FÉDÉRALE DE LAUSANNE
  • 13.   Eq.3 shows the nonlinear dependency of the estimated opti- p T cal flow on the depth map Z(r) as well as onλ θ λr ∇I T∇I1z t  θ r 1 tp   translation t the 3.3. TV-L1 DEPTH FROM MOTION to the sensor plane. In this nonlinear form, it∇I T TV-L1 DEPTH FROM MOTION  9 perpendicular V= difficult to include the projection in θ θ r T is VZa+Z + −λ−λr ∇I1 tp1 =variational framework. Problem formulation We assume for the moment that we know the camera trans- Eqs. 2 and 3  ρ(Z)ρ(Z) We assume for the moment that we know the camera trans- Nevertheless, combining   aT linearized we find r ∇I T tp lation parameters t for two successive frames I0between .the ation parameters t for two successive frames I0 and I1 . IFur- parallel projection r ∇I1the 1 relationship and 1 Fur- t and p optical hermore, we assume that the brightnessflow: not change be-be- thermore, Equation that the brightness does not change BC we assume does Projection Model ween those images. Using the definition of of optical flow andr In order to solve Eq. Eq.the tween those images. T (x)u −definition optical flow and= Z(r)tp . I1 (x) + I1 Using the I0 (x) 0 u In order to solve 9, (4)9, he projection inin Eq. 4, we can express the image residual the projection Eq. 4, we can express the image residual cancan be exploited. is giv be exploited. It It is ρ(Z) as in [6]: with respect to a known u (Z) as Linearization in [6]: p p ≤ With the introd 3. TV-L1 DEPTH FROM1}. 1}. With the int ≤ MOTION 0 be solved iteratively by We assume for the moment that solved iteratively by the be we know the camera trans- ρ(Z) == (x ++0 )0 ) + ∇I(T r r lationu0u− I− I0t. for two successive frames I0 tand I1 . Fur- T ρ(Z) I1 I1 (x u u + ∇I1 1 ( ZtZtp − ) 0 ) 0 . p− parameters (5)Data Term - bilinear in Z and p (5) thermore, we assume that the brightness does not change be- n+1 n+1pn +p those images.fol- Using p p= = tween solving thethe fol-the definition of optical flow 1 + 1 and τ depth map Z = Z(r) can be obtained by AA depth map Z = Z(r) can be obtained by solving If we know the motion[2]: We cast the depth estimation in can express the image residual owing optimization problem [2]:t: lowing optimization problem the projection in Eq. 4, we a TV-L1 optimization problem ρ(Z) as in [6]: In the the discrete domain t In discrete domain the s lution depends on the imple Z ∗ == arg min ∑ |∇Z| + λ ∑ ρ(Z,, I0 , I1 ), (x + u0(6) ∇I1 ( r ZtpdependsI0 . the im Z ∗ arg min ∑ |∇Z| + λ ∑ |ρ(Z, Iρ(Z)), I1 (6) ) + T lution − u0 ) − on (5) 0 I1 =| Z Zx∈D x∈D x∈D∈D erators. In 11, ∇ represe erators. In Eq.Eq. 11, ∇ rep x and the the scalar product and scalar product with A depth map Z = Z(r) can be obtained by solving the fol- w where DD we have an domain of of pixels and their ego-motion by leastgence operator defined is the discrete estimate pixels and x x their position gence operator as as defin where If is the discrete domain of Z: lowingestimate position We optimization problem [2]: square n the image. The left term in in Eq. represents ∗ the regular- map cancan recovered by Z on the image. The left term Eq. 6 6 represents arg min |∇Z| +map ρ(Z,be, Irecovered b the regular- be zation term. Here we set it to to the TV norm Z = which ∑ depth ∑ positivity constra the TV norm of of which Zim- Z Z ization term.arg minwe set 1it− I0 + I1 ( r Ztp − u0 ) x∈D λ positivity 1constraint depth I0 ), (6) t = Here im- ∗ T 2 I sparseness constraint on Z Z and acts edge-preserving. ered depth map, i.e. i.e.Z(r ered depth map, if if x∈D oses a a sparseness constraint on and acts edge-preserving. poses t the data term which where is to the image to provide global convergen x∈D he right term is is the data term which weDsetthe discrete domain ofto providexglobal conve The right term we set to the image pixels and their position on the image. The left term in Eq. 6 represents depth map esidual as defined in Eq. 5. We have chosen the robust L1 of detail in the the regular- residual as defined in Eq. 5. We have chosen the robust L1 ingof multi-scale resolution detail in the depth m orm as it has some advantages when compared toHere we set it to theaTVanorm of Z which im- ization term. the usu- ing multi-scale resolu norm as it has some advantages when compared to the usu- on Z and acts edge-preserving. poses a sparseness constraint lly employed L2 norm [9]. Eq. 6 is not a strictly is the data term which we set to the imagek I The right term convex use downsampled images ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 14. image residual. This can be repeated until level 0 is reached and show how to combine the depth to t: best to includ where weZ. Given two successive images I0we mustI1 we can recover thefrom motion esti- obtain the final depth map 0 Z. the image residual with respect is the∑ and0the rof p T ∇Iapproach: camera translation parameters by optimizing in Section2I31normego-motioneachrother, 1 = 0.10 (1 mation described L − I + rely on estimation described in Section 4. Since both parts Zt 1 Z∇I 4. EGO-MOTION ESTIMATION it is very likely that wexcan combine them by performing al- the coars t: the image residual with respect to ternating depth and ego-motion estimation. We find that it ∈D 1. At Camera Ego-Motion Estimation Let us assume now that we have an estimate of the depth map Z. Given two successive images I0 and I1 we can recover the In the special case of in the movement L Z by th is best to include the alternation schemecamera multi-scale parallel to s and ∑ the image residual with respect to t: ∗ T approach: sensor plane solving Eq. 12 results in the linear syste 1. r Z∇I =c(x) level we initialize L t by as explained I1 − I0 + r Ztp T ∇I1 At the coarsest1resolution with L,(12) camera translation parameters by optimizing the L2 norm of A(x)b = 0. 2 t = arg min ∈D I1 − I0 + I1 ( r Ztp − Zu0 ) small constant. We can first solve for L Z are zero L by some zero ters t x and ∑ I1 − I0 + r x∈D ∇I1 r Z∇I1 = 0. (12) Ztp T as explained in Section 3. Since the ego-motion parame-  2 2. With the fla  x∈D In casespecial tcaseT of camera 2. Withare zero theparallel to2we2 estimate verymotion 2 Z 2 ∂∂Ix1 ∂∂Iy1  the t = (t , , 0) Special case of camera movement parallel to the movement estimated∈D r map will be the xflat. r ters depth  ∑x input Z ∂ x the flat depth as the∂ I1 ∑ ∈D parameters  a A(x) = map sensor plane solving Eq. 12 results in the linear∈D r 4.Z 2 ∂ x ∂ y ∑x∈D r 2 Z 2 ∂ y In the special x y ∑x system k+1 parameters according to Section 2 ∂ I1 ∂ I1 ∂ I1 2 t3. Given the e sensor plane solving Eq. 12 results in the linear system A(x)b = c(x) with = c(x) with A(x)b 3. Given the estimated motion parameters linear system of equations k+1 and the Z, we first estimate the optical flow u0 = k and we compute the depth map at level depth map depth map r Z(r)tp , then k Z.   2 2 ∂ I1  2 ∂ I1 ∂ I1 ∑x∈D r 2 Z 2 ∂ x ∂ y  4. From the refined depth map k Z, we compute the motion Z(r)tp ,  r  ∑x∈D r Z ∂ x k t. − ∑x∈D r Z ∂ Ix (I1 − I0 ) 1 2  2 b = (tx , ty ) parameters T A(x) =  c(x) = I ∂ 2 Z 2 ∂ I1 ∂until the ∈D r resolution is ) the re 1 − ∑x finest Z ∂4. 1From . 2 Z 2 ∂ I1 ∂ I1 2Z 2 2∂ I1 ∂ I1 ∑x∈Drr Z ∂ y 5. Steps 3 andr are repeated I1 (I − I0 ∂ x ∑x∈D ∑x∈D 2 4 ∑x∈D r ∂y ∂y ∂x reached and the final depth map 0 Zobtained. ∂x ∂y A(x) =  2  is parameters For general camera motion, we can solve Eq. 12 by iter and ∂ I1 ∂ I1 2 ∂ I1 ∑x∈D , r )T Z 2 ∂ x ∂ y General case t = (t , t t 2 r 2 Z 6. e.g. Levenberg-Marquardt or gradient descen ∑x∈D tive methods,RESULTSwhere x contains Steps translatio ∂y xn+1 = xn + γ∇E(xn ) 5. the three 3 and x1 y z − ∑x∈D r Z ∂ Ix (I1 − I0 ) reached and In order to verify our approach, we use synthetic images of size 512 × 512 and ground truth depth maps of the image residual, parameters and E is the energy generated by c(x) = ∂ . Gradiend−descent ∂∂Iy1 (I1 − I0 ) ∑x∈D r Z ray-tracing of a 3D model of a living room. We have gen- and erated multiple sequences∂for = E various types of camera trans-∂ u For general camera motion, we can solve Eq. 12 by itera- ∂ xi x∈D lation, i.e. for movement parallel ∑andI1perpendicular to the∂ xi . T − I0 + ∇I1 u ∇I1 T tive methods, e.g. Levenberg-Marquardt or gradient descent: x n+1 = xn + γ∇E(xn ) where x contains the three translation image plane as well as for linear combinations of both. The ∂ I1 parameters and E is the energy of the image− ∑x∈D r Z ∂ x (Iis to evaluatepartial derivatives of u with respect toto veri residual, purpose 1 − I0 ) first ego-motion estimation In order the motio The and depth c(x) = seperately. . from motion parameters are given by the Jacobian matrix ∂ IWe run the ego-motion estimation with ground truth × 512 size 512 T − ∑x∈D r Z ∂ y (I1 − I0 ) 1 ∂E ∂u = ∑ I1 − I0 + ∇I1 u ∇I1 T . depth maps on the different sequences and we ray-tracing of a obtain the ∂ xi x∈D ∂ xi   translation vector estimates as listed rz r Zin Table 1. For sim- The partial For general camerato the motion we canwe only show the12 by itera- deviation of rthe Z derivatives of u with respect motion, plicity solve Eq. mean and standard  rz + r Ztz eratedr 0 multiple   i.e.  z parameters are given by the Jacobian matrix Ju = T normalized vectors.  0 lation,rryZtz r Zty for . tive methods, e.g. Levenberg-Marquardt or gradientfrom Zmotionr part by−r r Z + We evaluate the depth descent: x rx + Zt rz + using the −rz r inputs. Zt )2 normalize the plane  xn+1 = rxn + γ∇E(xn ) where x contains the threevectors asfor zcomparing the usedPOLYTECHNIQUE as translationWe image(rz + r Ztz )2 ground truth translation (r + r z z   input images which is convenient ÉCOLE pa- rz + r Ztz and E is the energy of rameters. In our residual,we use 5 levels of purpose is to ev rz Z 0 parameters the image experiments, FÉDÉRALE DE LAUSANNE resolution
  • 15. T ρ(Z) = I1 (x + u0 ) + ∇I1 ( r Ztp − u0 ) − I0 . (5) pn + τ 11 pn+1 = depth map Z = Z(r) can be obtained by solving the fol- 1 + τ∇ wing optimizationEstimation - TV-L1 Depth problem [2]: In the discrete domain the st lution depends on the implem Z ∗ = arg min Z x∈D ∑ |∇Z| + λ ∑ |ρ(Z, I0 , I1 ),| (6) erators. In Eq. 11, ∇ represen x∈D and the scalar product with here D is the discrete domain of pixels and x their position gence operator as defined in the image. The left term in Eq. 6 represents the regular- map can be recovered by Z = tion term. Here we set it to the TV norm of Z which im- depth positivity constraint h ses a sparseness constraint on Z and acts edge-preserving. ered depth map, i.e. if Z(r) e right term is the data term which we set to the image to provide global convergenc idual as defined in Eq. 5. We have chosen the robust L1 of detail in the depth map Z rm as it has some advantages when compared to the usu- ing a multi-scale resolution y employed L2 norm [9]. Eq. 6 is not a strictly convex use downsampled images k I0 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 16. T ρ(Z) = I1 (x + u0 ) + ∇I1 ( r Ztp − u0 ) − I0 . (5) pn + τ 11 pn+1 = depth map Z = Z(r) can be obtained by solving the fol- 1 + τ∇ wing optimizationEstimation - TV-L1 Depth problem [2]: In the discrete domain the st lution depends on the implem Z ∗ = arg min Z x∈D ∑ |∇Z| + λ ∑ |ρ(Z, I0 , I1 ),| (6) erators. In Eq. 11, ∇ represen x∈D and the scalar product with here D is the discrete domain of pixels and x their position gence operator as defined in the image. The left term in Eq. 6 represents the regular- map can be recovered by Z = tion term. Here we set it to the TV norm of Z which im- depth positivity constraint h ses a sparseness constraint on Z and acts edge-preserving. ered depth map, i.e. if Z(r) e right term is the data term which we set to the image to provide global convergenc idual as defined in Eq. 5. We have chosen the robust L1 of detail in the depth map Z rm as it has some advantages when compared to the usu- ing a multi-scale resolution y employed L2 norm [9]. Eq. 6 is not a strictly convex use downsampled images k I0 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 17. T ρ(Z) = I1 (x + u0 ) + ∇I1 ( r Ztp − u0 ) − I0 . (5) pn + τ 11 pn+1 = depth map Z = Z(r) can be obtained by solving the fol- 1 + τ∇ wing optimizationEstimation - TV-L1 Depth problem [2]: In the discrete domain the st lution depends on the implem Z ∗ = arg min Z x∈D ∑ |∇Z| + λ ∑ |ρ(Z, I0 , I1 ),| (6) erators. In Eq. 11, ∇ represen x∈D and the scalar product with here D is the discrete domain of pixels and x their position gence operator as defined in the image. The left term in Eq. 6 represents the regular- map can be recovered by Z = tion term. Here we set it to the TV norm of Z which im- depth positivity constraint h ses a sparseness constraint on Z and acts edge-preserving. ered depth map, i.e. if Z(r) e right term is the data term which we set to the image to provide global convergenc idual as defined in Eq. 5. We have chosen the robust L1 of detail in the depth map Z rm as it has some advantages when compared to the usu- ing a multi-scale resolution y employed L2 norm [9]. Eq. 6 is not a strictly convex use downsampled images k I0 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 18. T ρ(Z) = I1 (x + u0 ) + ∇I1 ( r Ztp − u0 ) − I0 . (5) pn + τ 11 pn+1 = depth map Z 2: side view withobtained by solving the fol- Figure = Z(r) can be the projections of camera motion 1 + τ∇ wing optimizationEstimation - TV-L1 Depth problem [2]: In the discrete domain the st r- [2] lution depends on the implem optimization problem and thus hard toI solve. From(6) and Z ∗ = arg min ∑ |∇Z| + λ ∑ |ρ(Z, 0 , I1 ),| he [9] we knowZthat a convex relaxation can be formulated: erators. In Eq. 11, ∇ represen x∈D x∈D or and the scalar product with Functional Splitting gence operator as defined in here D is the discrete domain of 1pixels and x their position the image. Themin ∑ |∇Z| + 6∑ (V − Z)2 the regular- )|, map can be recovered by Z = Z ∗ = arg left term in Eq. represents + λ ∑ |ρ(V Z 2θ x∈D tion term. Here we x∈D it to the TV norm of Z which im- set x∈D depth positivity constraint h 2) a sparseness constraint on Z and acts edge-preserving. (7) ses ered depth map, i.e. if Z(r) where a data term which we Z to for θ → e right termVisisthe close approximation ofset andthe image 0 we provide global convergenc to idual have V → Z. We solve Eq. 7 using an alternative two-step detail in the depth map Z al as defined in Eq. 5. We have chosen the robust L1 of rm asiteration scheme: he it has some advantages when compared to the usu- ing a multi-scale resolution y employed L2 normsolve for V: 6 is not a strictly convex 1. For fixed Z, [9]. Eq. use downsampled images k I0 3) 1 V ∗ = arg min V ∑ 2θ (V − Z)2 + λ ∑ |ρ(V )|. (8) x∈D x∈D i- tz 2. For fixed V, solve for Z: is k. 1 Z = arg min ∑ |∇Z| + ∗ ∑ (V − Z)2 . (9) ed Z x∈D 2θ x∈D al ÉCOLE POLYTECHNIQUE Eq. 8 can be solved by the following soft-thresholding: FÉDÉRALE DE LAUSANNE
  • 19. central projection on the object plane parallel to the sensor T ρ(Z) = I1 (x +and)thus hard r Ztp − u0 ) −[2]. and (5) timization problem u0 + ∇I1 ( to solve. From I0 plane is given by: Z = arg min ∑ |∇Z| + 1 − n+1 ∑ pn + τ 11 ] we know that2: convex relaxation can be formulated: ∗ ∑ (V pZ) + λ= |ρ(V )|, 2 Figure a side view with the projections of camera motion depth map Z = Z(r)tp = be obtained by solving (2) fol- can z · r r + r Z(r)t r the Z x∈D 2θ x∈D x∈D 1 +(7)τ∇ wing optimizationEstimation - Depth problem [2]:r Z(r) rz + r Z(r)tz TV-L1 − . r Z(r) where V is a close approximation of Z and for θ → 0 we have V → Z. In the discretean alternative two-step We solve Eq. 7 using domain the st 1 Let us define Eq. 2 as the parallel projection. The optical iteration scheme: ∗ r- Z ∗ Z ∑ |∇Z| ∑2θ and λ ∑ + to , ), Z = arg min arg min + |∇Z| + thusZ) ρ(Z, I∑I |ρ(VFrom(6) fixed lution depends on the implem optimization problem ∑ (V − hard λ solve. )|, 1. [2] and solve for V: = 2 flow can be approximated by the following projection on the sensor plane: | For Z, 0 1 | he [9] we x∈D Zthat a convex rrelaxation can be formulated: erators. In Eq. 11, ∇ represen know x∈D x∈D + x∈D r Z(r)t x∈D or u = rz · − r. (3) (7) V ∗ = and the scalar + λ ∑ |ρ(V )|. arg min ∑ 1 (V − Z)2 product with (8) rz + r Z(r)tz here D is the discrete domain of dependencyand x θ →opti- we here V Functional Splitting the nonlinear of Z and the estimated 0 is a closeEq.3 shows approximation pixels of for their position V x∈D 2θ gence operator as defined in x∈D 1 ve Vimage. We solve on the depth map Z(r) asrepresents the regular- fixed map can be recovered by Z = the →ZZ.= Themin ∑ |∇Z|sensor plane. alternative form, it is |ρ(V )|, cal flow Eq. 7 using an well as on the translation tz 2 two-step 2. For arg perpendicular to the + 6∑ this nonlinear + λ ∑ V, solve for Z: ∗ left term in Eq. In (V − Z) ration scheme: difficult to include the projection in a variational framework. we x∈D combining 2θ x∈D Z tion term. HereNevertheless,it to the TVand 3 we find aZ which im- Z∗depth positivity ∑ (V − Z)2. (9)h set Eqs. 2 norm of linearized∈D x = arg min ∑ |∇Z| + 1 constraint 2θ x∈D 2)For fixed Z, solve for V: on Zparallel projection and the optical Eq. 8 can (7) by the following soft-thresholding: Z x∈D ses a sparsenessrelationship between the and acts edge-preserving. be solved depth map, i.e. if Z(r) constraint ered where Fixed Z: a data term = r Z(r)tp . flow: u which we Z to for θ → e right termVisisthe close approximation ofset andthe image 0we provide global convergenc (4) to idual have V → Z. We3.solve1 DEPTH FROM MOTION robust two-step r detail inρ(Z) <depth ∇I1T tp)2 Z al as defined in Eq. 5. We have chosen alternative L1  λof ∇I1T tTp if the −λ θ ( r map 1 TV-L Eq. 2 7 using an the  θ advantages when ∑ the camera the has some ∑ 2θ the − Z) + λcompared to ⇒ usu-  ing ∇I tp if |ρ(Z)|> λ (( rr ∇I1Tttpp))2 ∗ iteration scheme: rm asVit = arg min assume for(V moment that we know |ρ(V )|. trans-(8)V = Z +  −λρ(Z) r a 1multi-scaleλθθresolution . θ if ρ(Z) T 2 he VWe ≤ ∇I1 1. For L2 lation∈Dsolve for two 6 is not a strictly convex y employed fixed x parameters t Eq. successive∈D I0 and I1 . Fur- normZ, [9]. for V: x frames use downsampled images (10)0 r ∇I tT 1 p kI thermore, we assume that the brightness does not change be- In order to solve Eq. 9, the dual formulation of the TV norm For fixed V, solve for Z: tween those images. Using the definition of optical flow and can be exploited. It is given by: TV (Z) = max{p · ∇Z : the projection in Eq. 4, we can express the image residual 3) ρ(Z) as in [6]: 1 p ≤ 1}. With the introduced dual variable p, Eq. 9 can ∗ Z = arg min ∑= |∇Z| 2θ ∇I ρ(Z)V I1 ∑ + (V1 ∑ Ztp −0Z) I∑ (5)(9)be solved (8) bypnthe τ∇(∇ · pn −V /θ ) [3, 2]: 0 − 2 ∗ V = arg min (x + u ) +1 T ( r Z) − u λ−2 . |ρ(V )|. + (V ) x0∈D . iteratively + Chambolle algorithm i- Z x∈D x∈D 2θ obtained by solving the fol- A depth map Z = Z(r) can be x∈D pn+1 = 1 + τ∇(∇ · pn −V /θ ) . (11) tz 8 can be solvedlowingthe following soft-thresholding: q. 2. For fixed V,optimization problem [2]: by solve for Z: In the discrete domain the stability and properties of the so- lution depends on the implementation of the differential op- is Z ∗ = arg min ∑ |∇Z| + λ ∑ ρ(Z, I0 , I1 ), (6) Z x∈D erators. In Eq. 11, ∇ represents the discrete gradient operator k. x∈D 1 ∇ represents the 2 the scalar product within [3]. From Eq. discrete diver- and λ θ r ∇Ithetimage. The left∑ in< −λ θ ( r∑ regular-)2 map can be recovered by Z = V − θ ∇ · p. Furthermore, the ∗  where= is the discrete domain of pixels and x their (V − Z) gence operator as defined Z T D arg min |∇Z| + position . (9) 11, the depth ed   on 1 p Z xterm ∈D 2θ x the T if ρ(Z) Eq. 6 represents∈D 1 tp ∇I izationT term. Here we set it to the TV norm of Z which im- depth positivity constraint has to be imposed on the recov- = ∇I tp if ρ(Z) > and acts ∇I1 tp )2 T al Z + −λ θ rposes a1sparseness constraint on Zλ θ ( redge-preserving. ered depth map, i.e. if Z(r) < 0 we set Z(r) ← 0. In order . ÉCOLE POLYTECHNIQUE Eq. 8 can be solved by the following soft-thresholding: FÉDÉRALE DE LAUSANNE
  • 20. 1 ρ(Z) = I1 (x opticalthusThe optical to to Ztp asFrom [2]I0 . and = arg and x∑ |∇Z| solve. x∑ [2] and ization problemand + isprojection∇I T flow ris defined parallel0 )the optimization∗problemminthus hard to+ 2θ From(V − Z)2 + λ ∑ |ρ(V )|, andugiven+hard (object plane − u to − sensor Z (5) Z central on the solve. )thus hard u solve.theFrom [2] and timization problem and brightness pattern between two images. The [9] we know that ∗a convex relaxation can be formulated: n+1 x∈Dpn + τ plane 0 flow. by: r+ 1r Z(r)t r appar- ∈D 1 ∈D e know that a convex relaxation can be formulated: ent motion rzof Z = arg min ∑ |∇Z| + ∑ (V − Z)2 + λ= |ρ(V )|, ∑ (7) 11 2θ x∈D ofp and for∈D → 0 we tp = projection on the object plane parallel to the sensor (2) · − . ] we know that2: plane is given by: withr thecan be Z(r) r of camera motion approximation Z a convex relaxation projections central r formulated: where V is a close depth map Z = side tp = be obtained by solving (2)arg minV → Z.+ We∑ (V − Z) + λ ∑ |ρ(V )|, alternative two-step Figure Z(r) can rz r + r Z(r)t rview rz + Z(r) Z(r)tz τ∇ Z x∈D x θ . Z = havefol- the ∑ |∇Z| 1 solve Eq. 7 using an 1 +(7) wing optimizationEstimation projection(2) the iteration scheme:In the discrete(7)Z alternative two-step Depthbe approximated byparallel − - . The optical whereVV→isZ.a closesolve Eq. 7 usingofan and for θ → 0 we r Z(r) rz + projection.TV-L1 · − ∗ 2 Let us define Eq. 2 as the r Z(r)tz r Z(r) flow can problem ·[2]:rfollowing r t 1= r z r+ ther Z(r)t on have 2θ Z x∈D We approximation x∈D x∈D domain the st Z = min ∑ argLet + plane: |∇Z| + thusZ)|ρ(Z, opticalI |ρ(VFrom(6) fixed 7 using anfor V: =∗arg arg min |∇Z|us can+ Eq.∑ the parallel the following projectionhave V)|, We solve lution depends on the implem 1 (V −byparallel2 λ ∑ |ρ(Von the→isZ.For fixed Eq. solve for alternative two-step Let us define Eq. 2r as theZ(r)t 2 rprojection. The where V 1. aiteration scheme: of Z V: for θ → 0 we p r Z(r) + Z(r) optical Z, |∇Z| 2θ approximatedλZ) + + λ solve.)|, z z close approximation and sensor plane: r- Z x∈D∑ flow can be approximated by the following projection on the iteration optimization problemas∑ (V − hardThe I∑ ), scheme:[2] and solve flow be min ∑2θ2 r and Z(r)t projection. to he Z Z∗ = sensor define x∈D+ r knowZthat a z · rx+ r Z(r)t∈Dr. u ∈D ∑ 0 x∈D (3) | [9] we x∈Dsensor plane:= rconvex rrelaxation can be formulated:= argerators. (V −Eq.+11, ∇ represen 1. For x∈D , 1 1. For fixed Z, solve for V: Z, 1 In x∈D − + xz Z(r)t r (7) (3) (7) V∗ min ∑ Z)2 λ ∑ |ρ(V )|. (8) e V is a closeshowsapproximationrzof rand zfor θ (3) opti- 0 Vwearg min ∑ 2θ (V −min+∑ ∑ scalar(8) λxx∑ |ρ(V )|. (8) V 1 and the (V − Z) + ∈D or 1 2 product with z= r · u r + r Z(r)t − r. ∗ V x∈D 2θ z = arg Eq.3 approximation + r Z(r)tZZ(r)t estimated → we = is a closeEq.3 depth u = · of + the nonlinearr dependency of r. z Z − the for θ 0 and translation t→ here D is the flow on theshows the nonlinear dependencyand x theiropti- 2. For fixed V, solveVfor∈Doperator as∈Ddefined in V Functional Splitting map Z(r) as well as on the the estimated position ∗ Z) x λ 2θ |ρ(V )|. 2 r here Z. We solve showsdomaindependency of the estimated opti- z cal discrete nonlinear of pixels z z V → → Z.perpendicularEq.thethe using an an well as on the translation tz 2. For fixed V, solve for Z: genceZ: V x∈D ve Vimage. We solve the 7 depth Z(r) as1 6asrepresents the For fixed V, solve for Z: map can be recovered by Z = alternative two-step of x∈D cal flow on the sensor using this nonlinear form, 2. Eq. 7in Eq.well alternative two-step Eq.3 on scheme: Themin ∑to |∇Z|sensorinInathis nonlinearnonlinear form, itregular- Z ∗ = arg min |∇Z| + 11 (V − Z)2 . (9) left term the plane. ∑ (V framework. ∑ cal flow on depth map map Z(r) as on the translation t2 it is to plane. In the Z difficult perpendicular tosensor+ plane. In this − Z) + λ is |ρ(V )|, ∗ = arg perpendicular projection z ration scheme: to include thethethe projection in a variationalform, it is depthx∑ ∑−|∇Z|. + 2θx∑constraint = + Z ∑(V Z) 2θ(9) ∑ tion term.Nevertheless, combining to Eqs.22θ x∈Da findlinearizedZ whichZ im-min ∑Z|∇Z| arg1minpositivity∈D (V − Z)2. (9)h Z Herefor V: includeEqs. projectionwenorm find aframework.∈D = arg we x∈D it thetheand variational awea linearized x difficult to set difficult to include TVwe variational linearized 3 in orFor fixedrelationship between the the parallelprojectionand the the optical fixed Z, solve relationship between parallel projection and optical Nevertheless, combining Nevertheless, combining 2 and 3 and 3 Eqs. 2 find of framework. ∗ ∗ 2θ Z ∈D 2 Z, solve for V: on Zparallel projection and the Eq. 8 can be solved by the following soft-thresholding: soft-thresholding: Z x∈D opticalEq. 8 can be (7) ered the x∈D map,∈D if Z(r) depth i.e. x∈D 2) a sparsenessrelationship between the and acts edge-preserving.cansolved by by the following soft-thresholding: ses constraint x flow: flow: Eq. 8 be solved following where =u = u which we Z and (4)θ → e right termVisisthe closeuapproximation ofset(4) (4) forimage 0tweif ρ(Z) < −λ θ ( r ∇I t ) convergenc a data termr = rpZ(r)tp . flow: Fixed Z: Z(r)t . r Z(r)t . p to the  λ θ r  to provide global idual have V → Z. TV-L3.solve1 DEPTH FROM MOTION V robust two-stepif ρ(Z) t> λTp ( rifinρ(Z)<<−λ θθ(( rr ∇I1T tp2))22Z in We 1 5.DEPTH FROM MOTION an alternative L1  ofr 1T T1pt θ if ) −λ 13. TV-L We 7 MOTIONEq. using ∇I T T 2 ∗ as defined 3. Eq. DEPTH 2 have chosen the = Z +  −λ θ r ∇Iλtθ λr ∇Idetail ∇I tthe .depth ∇I1 tp map  1 p 1 p ρ(Z) = = min We ∑ (V − − Z)we + ∑∑ ∑ parameters moment we knowcompared the Z =   θing ∇I λ θ ( if if∇Iρ(Z)(10)λλ ∇IT   θ ∇I T T al V V ∗argarg minassume for1theTV-LZ)that+ λ λthe camera trans- )|. (8) 1 T T 2 FROM 2 know |ρ(V )|.  1 p 1 p rm asiteration scheme:2θ t(Vtwo successive framesknow |ρ(Vtrans-trans-V =V + +−λ−λρ(Z)∇Ia≤t1multi-scaleθθ(resolution . (8) Z Tt 2 it has V xWefor2θ momentmoment thatx∈DI andthe Fur- to ⇒ usu-  ρ(Z) r r 1 ptp if if |ρ(Z)|≤≤λλθθ((rrr∇I11T tpp)))22 r ρ(Z) ) > ρ(Z) if θ T 2 some assume for for thatwhenwe the camera lation advantages t  |ρ(Z)| > he T r ∇I1 tp 1 p I . camera We assumeV ∈D ∈D assume the the 0 1 1. For Lparametersweimages. Using the definitiondoesoptical a strictly convex 9, the ∇Iruset downsampled ( r ∇I1 p (10)0 lationfixed xthosesolve that thetwo 6 isofnot0 andIand. Fur-. order to solve Eq. r dual∇I lation parameters for V: successive∈D flow 0 1and IIn Fur- Z, t [9]. t Eq. thermore, norm for two successive frames for brightness not change be- x frames I y employed 2 tween we assume that the brightnessIdoes not change be-exploited. It is given by: TV (Z) = max{p · ∇Z : images (10) 1 T |ρ(Z)| T 1 tformulation of the TV norm p 1 p ∇I kI thermore,thermore, inthat the brightness does not residual be- be 1}. Withorder to solvedual variabledual formulation of the TV norm the projection Eq. 4, we can express the image change can we assume orForFixed V:V, solve as in [6]: Using the definition of optical flow flowsolved iteratively byexploited. 9,It9,isthe p, [3, 2]:9 TV (Z) = max{p · ∇Z : fixed V, tween those for fixed solve forprojection in Eq. 4, we can express the image residualIn In the solve Eq.It is algorithmby: TV (Z) = max{p · ∇Z : tweenZ: Z: ρ(Z)images. images. Using the definition of optical and and order tointroduced Eq. the dual formulation of the TV norm those be p ≤ can exploited. be the Chambolle given by: given Eq. can the in the projection ρ(Z)Eq. (x +we) can express − u )image residual 4, the − I . (5) can be 1 T ρ(Z) as in= I [6]: u + ∇I ( r Zt p ≤ 1}. With the introduced dual variable p, Eq. 9 can 3) p be solved + τ∇(∇ · p −V /θ ) .Chambolle algorithm [3, 2]: can variable p, Eq. 9 1 0 1 ≤ 1}. pWith the introduced dual(11) n n (V by Z) the fol- ∑ be )|. 1 +Chambolle the V A= arg min ∑ be obtained− solving + λ 2 |ρ(Vsolved=iteratively· pby−V /θChambolle algorithm [3, 2]: 0 p 0 ρ(Z) as∗in [6]: 2 depth map Z = Z(r) can 1 1 n+1 p (8) iteratively by ) Z = min=∑ ∑= |∇Z| +(+ ∑∑ u−− I0 .) − I .. In ⇒ + T[2]: 1T r (V − 2 . (5) (9) τ∇(∇ Z ∗ =∗ argarg min (x|∇Z| ++ u2θ r∇IZt( (VZtp)− u0Z)x0∈Dthe discrete domain the stability andthenprojection nso- /θ ) n algorithm ρ(Z)V I (x lowing optimization1problem 0 ) ρ(Z) Z I1 + u0x∈D 2θ2θ p − 0 ) ∇I1 Z) (5) (9) p + τ∇(∇ properties of · p −V the Z x∈D pn+1 =of the differential op- lution depends on the implementation n + τ∇(∇ · pnn−V /θ ) . p 1 + τ∇(∇ · p −V /θ ) (11) i- A depth x∈DZ = ∑ |∇Z| + λx∈D∈D , I ), solving the fol-In Eq. 11, ∇ represents the discrete gradient operator ∗ = arg Z x∈D ∑x I Z map min Z(r) can be obtained byρ(Z, 0(6) 1 erators. p n+1 = τ∇(∇ · p 1 +the discrete n −V . (11) A depth map Z = Z(r) can problem [2]: by solving the fol- the scalar product with ∇ representsthe stabilitydiver- /θ ) lowing optimization be obtained x∈D q.z 8 can solved by by the∗discretefor[2]:of pixels and x their position gencecanIn the as definedZin [3].− θ ∇theEq. 11, the and ofproperties of the op- 2. For fixedTV-L2the following soft-thresholding: operator the depends domainimplementation the differential so- solved theisfollowingZ: whereV, solve domain tcan be be lowing optimization problem scheme and In discrete and of the D denoising soft-thresholding: From depth discrete domain · p. stability the lution on the image. Thearg min ∑Eq. 6 represents ρ(Z, I0 , I1 ), map (6) be recovered by = V Z = left term in |∇Z| + λ ∑ the regular- on the Furthermore, properties so- is depth positivity depends on to lution the ∇imposed on the recov- erators. In Eq. 11,beimplementation of thegradient operator represents the discrete differential op- Z ∗ = sparseness∑ |∇Z|to the and acts xofI0 , I1 ), arg min constraint + λ ∑ ρ(Z, 1 Z ization term. Here we set itx∈D TV norm ∈DZ which im- (6) depth map, constraint has product Z(r) ← 0.represents the discrete diver- ered erators.i.e. if scalar 0 we set with ∇ In order gradient operator and In Eq. 11, ∇ represents the discrete the Z(r) < k.   where= is is the data term which we set to the x ∑ positionand gence operator as defined different levels Eq. 11, the depth Z ∗ D arg min ∑ |∇Z| + poses a on Z edge-preserving. 2 The right termthexdiscrete domain of pixels and image (Vprovide global convergence and to with ∇ represents the discrete diver- Z ∈D x∈D their to − Z) the scalar product handle in [3]. From . (9)  θ rD Tonthe discrete domain term in<chosen (θtheirr Lthe tT detailgence operator as defined solving V that us- Eq. 11, the depth residual as defined inif left We −λ 6 the robust ∇Iof t 2 2 map can be we propose T Z x of< 5. pixels −λx tor ∇I T 1p a multi-scale resolution approach. byin [3]. From λr ∇I1istp timage. TheEq.ρ(Z)haveEq. θrepresents∈D regular-)in the depth map Zrecovered ThisZmeans − 7 ∇ · p. Furthermore, the ∇Ithe itp someρ(Z)∈D compared 2θ x 1 ing )p ed  θ where norm1as has if advantages when and λ (the usu- position 1 = Eq. θ we izationT L Here we set it toisthe TV normconvex which im-mapdepthbe recovered by different to be imposed on the recov- term. of TZ positivity constraint has allyT The left norm [9]. Eq. 6 > λaθ ( r ∇I T t 2 downsampled images I and I of L Z = V − θ ∇ · p. POLYTECHNIQUE the Eq. 6 represents the regular- )2 al Z + −λ−λthe image.ta1sparseness constraint on θand r ∇I1 tp )p = on r rposes tpif term ρ(Z) λ Z ( acts θ ∇Ip + Eq. θ can∇I1 solved setif the following Z edge-preserving.depth.positivity constraint has to be FÉDÉRALE DEon the recov- employed 2 ρ(Z) > in not strictly use can depth map, i.e. if Z(r) < 0 we(the Z(r) ← 0. In order . ered k 0 sizesÉCOLE Furthermore, k 1 set 8ization term. Here we byit to the TV norm of soft-thresholding: be which1im- imposed LAUSANNE
  • 21. 12 Joint Depth and Ego-motion Estimation ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 22. 13 Joint Depth and Ego-motion Estimation Multiple images / video sequence  propagate solution  initialize motion parameters with zero ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 23. 14 GPU Implementation Image processing on the GPU: ‣ Host : I/O + render calls ‣ Device: parallel processing per pixel by fragment shading kernels • Fixed number of iterations and parameters • Non optimized code • We reach a performance of 5 fps ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 24. 15 Results (the synthetic room sequence – Camera moving left) ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 25. 16 Results Depth from motion GT GT GT movement in x movement in y movement in z x+y+z MSE: 4.5 · 10−4 MSE: 2.7 · 10−4 MSE: 3.1 · 10−4 MSE: 3 · 10−4 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 26. 17 Results Joint ego-motion and depth estimation GT GT movement in x movement in x movement in z movement in z MSE: 6.2 · 10−4 MSE: 10.1 · 10−4 MSE: 4.1 · 10−4 MSE: 3.9 · 10−4 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 27. 18 Conclusions and Future Perspectives  We propose a new SfM algorithm based on TV-L1 model - Joint estimation of motion parameter and dense depth map - Fast and high parallelizable - (almost) real time implementation (5fps) based on GLSL  Limitations - no camera rotation - Tested only on synthetic images  Future work - Include rotation - Include a robust tracker for the ego-motion ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
  • 28. 19 Thank you for your attention Questions? ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE