SlideShare a Scribd company logo
Proximal Splitting
and Optimal Transport
       Gabriel Peyré
    www.numerical-tours.com
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
ork,         Measure Preserving    Maps
 ica-
d ofDistributions µ0 , µ1 on Rk .
 ase.
eeds
ans-
 that
eme
 rate
ance
eval
 t al.         µ0                    µ1
ork,         Measure Preserving Maps
 ica-
d ofDistributions µ0 , µ1 on Rk .
 ase.
eeds
    Mass preserving map T : Rk    Rk .
ans-
 that µ1 = T µ0 where (T µ0 )(A) = µ0 (T           (A))
                                               1

eme
 rate
ance            x                      T (x)
eval
 t al.         µ0                         µ1
ork,         Measure Preserving Maps
 ica-
d ofDistributions µ0 , µ1 on Rk .
 ase.
eeds
    Mass preserving map T : Rk    Rk .
ans-
 that µ1 = T µ0 where (T µ0 )(A) = µ0 (T                  (A))
                                                      1

eme
 rate
ance            x                      T (x)
eval
 t al.         µ0                         µ1


  Distributions with densities:     µi =   i (x)dx

        T µ0 = µ1                 1 (T (x))|det   ⇥T (x)| =      0 (x)
Optimal Transport
Lp optimal transport:
        W2 (µ0 , µ1 )p = min       ||T (x)   x||p µ0 (dx)
                        T µ0 =µ1
Optimal Transport
Lp optimal transport:
           W2 (µ0 , µ1 )p = min     ||T (x)   x||p µ0 (dx)
                         T µ0 =µ1

Regularity condition:
     µ0 or µ1 does not give mass to “small sets”.

Theorem (p > 1): there exists a unique optimal T .




       T     T
  µ1
           µ0
Optimal Transport
Lp optimal transport:
           W2 (µ0 , µ1 )p = min       ||T (x)    x||p µ0 (dx)
                         T µ0 =µ1

Regularity condition:
     µ0 or µ1 does not give mass to “small sets”.

Theorem (p > 1): there exists a unique optimal T .


Theorem (p = 2): T is defined as T =               with    convex.

       T     T          T (x)
                                  T (x )        T is monotone:
  µ1                   x                   T (x) T (x ), x x        0
           µ0          x
Wasserstein Distance
                                            µ

Couplings:          µ,                  x
     A       Rd , ⇥(A Rd ) = µ(A)   y
     B       Rd , ⇥(Rd B) = (B)
Wasserstein Distance
                                                            µ

Couplings:          µ,                                  x
     A       Rd , ⇥(A Rd ) = µ(A)                 y
     B       Rd , ⇥(Rd B) = (B)
Transportation cost:




       Wp (µ, )p = min                c(x, y)d⇥(x, y)
                         µ,   Rd Rd
Wasserstein Distance
                                                            µ

Couplings:          µ,                                  x
     A       Rd , ⇥(A Rd ) = µ(A)                 y
     B       Rd , ⇥(Rd B) = (B)
Transportation cost:




       Wp (µ, )p = min                c(x, y)d⇥(x, y)
                         µ,   Rd Rd
Optimal Transport
Let p > 1 and µ does not vanish on small sets.

 Unique       µ,   s.t.   Wp (µ, )p =           c(x, y)d⇥(x, y)
                                        Rd Rd

Optimal transport T : Rd     Rd :


                                                             µ

                                                        x
                                                y
                                                     (x, T (x))
Optimal Transport
Let p > 1 and µ does not vanish on small sets.

 Unique        µ,   s.t.    Wp (µ, )p =           c(x, y)d⇥(x, y)
                                          Rd Rd

Optimal transport T : Rd       Rd :


                                                               µ

                                                          x
p = 2: T =      unique solution of                y
       ⇥ is convex l.s.c.                              (x, T (x))
       ( ⇥)⇤µ =
1-D Continuous Wasserstein
Distributions µ,    on R.
                                           t
Cumulative functions:           Cµ (t) =       dµ(x)

For all p > 1:     T =C     1
                                 Cµ
    T is non-decreasing (“change of contrast”)
1-D Continuous Wasserstein
Distributions µ,       on R.
                                                  t
Cumulative functions:                  Cµ (t) =       dµ(x)

For all p > 1:     T =C            1
                                        Cµ
    T is non-decreasing (“change of contrast”)

Explicit formulas:
                       1                                       H
   Wp (µ, )p =             |Cµ 1       C   1 p
                                             |
                   0

  W1 (µ, ) =         |Cµ       C | = ||(Cµ        C ) ⇥ H||1
                 R
Grayscale Histogram Transfer
                                                f1
Input images: fi : [0, 1]
                        2
                            [0, 1], i = 0, 1.




f0
Grayscale Histogram Transfer
                                                         f1
Input images: fi : [0, 1]  2
                                     [0, 1], i = 0, 1.
Gray-value distributions: µi defined on [0, 1].
      µi ([a, b]) =            1{a   f b} (x)dx
                      [0,1]2
                                                              µ1

f0




            µ0
Grayscale Histogram Transfer
                                                           f1
Input images: fi : [0, 1]  2
                                     [0, 1], i = 0, 1.
Gray-value distributions: µi defined on [0, 1].
      µi ([a, b]) =            1{a   f b} (x)dx
                      [0,1]2
Optimal transport: T = Cµ11              Cµ0 .                     µ1

f0                         Cµ0 (f0 )                            T (f0 )
                  Cµ0                               Cµ11



            µ0                                                     µ1
pplication to Color Transfer
                Color Histogram Equalization
                                                      1
    Input color images: fi RN 3 . projection iof= to style
                         Sliced Wasserstein
                                                  ⇥ X
                                                      N x
                                                                                                            fi (x)
                         image color statistics Y



                           Optimal transport framework Sliced Wasserstein projection Applications

                               Application to Color Transfer
      Source image (X )

     f1                                    f0
                                                                                               Sliced Wasserstein project
                                                                                               image color statistics Y

     f0

                                            Source image after color transfer
      µ1 image (Y )
       Style                                Source image (X )
                                            µ0
                                J. Rabin     Wasserstein Regularization
pplication to Color Transfer
                Color Histogram Equalization
                                                     1
   Input color images: fi RN 3 . projection iof= to style
                        Sliced Wasserstein
                                                 ⇥ X
                                                     N x
                                                                                                            fi (x)
                        image color statistics Y
   Optimal assignement:         min ||f0 f1 ⇥ ||
                                                          N

                           Optimal transport framework Sliced Wasserstein projection Applications

                               Application to Color Transfer
      Source image (X )

     f1                                    f0
                                                                                               Sliced Wasserstein project
                                                                                               image color statistics Y

     f0

                                            Source image after color transfer
      µ1 image (Y )
       Style                                Source image (X )
                                            µ0
                                J. Rabin     Wasserstein Regularization
pplication to Color Transfer
                Color Histogram Equalization
                                                     1
   Input color images: fi RN 3 . projection iof= to style
                        Sliced Wasserstein
                                                 ⇥ X
                                                     N x
                                                                                                               fi (x)
                        image color statistics Y
   Optimal assignement:         min ||f0 f1 ⇥ ||
                                                             N

   Transport:             T : f0 (x)              R3              f1 ( (i))                   R3
                              Optimal transport framework Sliced Wasserstein projection Applications

                               Application to Color Transfer
      Source image (X )

     f1                                      f0
                                                                                                  Sliced Wasserstein project
                                                                                                  image color statistics Y

     f0

                                               Source image after color transfer
      µ1 image (Y )
       Style                                   Source image (X )
                                               µ0
                                T
                                  J. Rabin      Wasserstein Regularization
pplication to Color Transfer
                  Color Histogram Equalization
                                                     1
   Input color images: fi RN 3 . projection iof= to style
                        Sliced Wasserstein
                                                 ⇥ X
                                                     N x
                                                                                                                                                                fi (x)
                        image color statistics Y
   Optimal assignement:         min ||f0 f1 ⇥ ||
                                                                   N
                                                          Optimal transport framework Sliced Wasserstein projection Applications



   Transport:        T : f0 (x) R3Application to Color Transfer R3
                                          f1 ( (i))
                                 Optimal transport framework Sliced Wasserstein projection Applications


                       ˜ Application to ColorfTransfer
    Equalization:) f0 = T (f0 )               ˜ = f1
                                                0                Sliced Wasserstein projection of X to sty
     Source image (X                                             image color statistics Y


     f1                                         f0                                                                                  T (f0 )
                                                                                                                             Sliced Wasserstein project
                                                                                                                             image color statistics Y
                                                                           Source image (X )
                                                                                                               T
     f0

                                                  Source image after color transfer
       µ1 image (Y )
        Style                                     Source image (X )
                                                  µ0                                Source image after color transfer
                                                                                      µ1
                                                                             Style image (Y )

                                   T                                                                                    J. Rabin   Wasserstein Regularization
                                     J. Rabin      Wasserstein Regularization
cðdvÞ ¼ l0> þ dvÞ detðrðv þ dvÞÞ À l1 ¼ 0:
                                  ðv
    can be thought as an elliptic system thought as anThe sys-system of equations. The trilinearRelaxation was performed for transferring
 v cc
    >                    tem cv cc can be of equations. elliptic                               a sys-
                                                                            the GPU. We used cubic grid. interpolation used a trilineara parallelizable four-
                                                                                                          the GPU. We operator using interpolation operator for transferring

                                    Image Registration
                       Ittem isto verify that a correction for dv can be obtained by solving with an
                          is easy solved using preconditioned conjugate gradient              color Gauss-Seidel relaxation scheme. Thisrestriction
 s solved using preconditioned conjugate À1       gradient with an                                        the coarse grid residual increases robustness
                                                                            the coarse grid correction to fine grids. Thecorrection to fine grids. The residual restriction
                       the system dv % c> ðcv c> Þ cðvÞ (Nocedal and Wright, 1999) The sys-   and efficiency and is especially suited for the implementation on
                         incomplete Cholesky preconditioner.
mplete Cholesky preconditioner.         v      v
                                                                            operator for projecting residual from for projecting residual from the fine to coarse grids is
                                                                                                          operator the fine to coarse grids is
                       tem c c> can be thought as an elliptic system of equations. The sys-
                              v c                                                                                the GPU. We used a trilinear interpolation operator for transferring
                        tem is solved using preconditioned conjugate gradient with an                            the coarse grid correction to fine grids. The residual restriction
                        incomplete Cholesky preconditioner.                                                      operator for projecting residual from the fine to coarse grids is




                                                                                                     T




                                                                 [ur Rehman et al, 2009]
                        Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices from Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the
                        anterior part of the brain.
Convex Formulation (Benamou-Brenier)
       ⇢
            ⇢ : Rd ⇥ [0, 1] ! R+   solving:
Find
            m : Rd ⇥ [0, 1] ! Rd

           W (µ0 , µ1 )2 = min J(x) + ◆C (x)
                        x=(m,⇢)
Convex Formulation (Benamou-Brenier)
         ⇢
              ⇢ : Rd ⇥ [0, 1] ! R+   solving:
Find
              m : Rd ⇥ [0, 1] ! Rd

             W (µ0 , µ1 )2 = min J(x) + ◆C (x)
                          x=(m,⇢)

         Z      Z   1
J(x) =           j(x(s, t))dtds
        s2Rd t=0
                8 ||m||2
                < ˜˜ ⇢     if ⇢ > 0,
                               ˜
      j(m, ⇢) =
         ˜ ˜
                : 0 if ⇢ = 0 and m = 0,
                            ˜        ˜
                   +1 otherwise.
    2 R 2 R2
Convex Formulation (Benamou-Brenier)
         ⇢
              ⇢ : Rd ⇥ [0, 1] ! R+   solving:
 Find
              m : Rd ⇥ [0, 1] ! Rd

             W (µ0 , µ1 )2 = min J(x) + ◆C (x)
                          x=(m,⇢)

         Z      Z   1
J(x) =            j(x(s, t))dtds
         s2Rd t=0
                 8 ||m||2
                 < ˜˜ ⇢     if ⇢ > 0,
                                ˜
       j(m, ⇢) =
          ˜ ˜
                 : 0 if ⇢ = 0 and m = 0,
                             ˜        ˜
                    +1 otherwise.
     2 R 2 R2
C = {x = (m, ⇢)  div(x) = 0, B(⇢) = (⇢0 , ⇢1 )}
      B(⇢) = (⇢(0, ·), ⇢(1, ·))
Numerical Examples
 ⇢0                  ⇢1




                          t
Numerical Examples
                ⇢0                                         ⇢1
  con-
work,
plica-
 ad of
 ease.
peeds
 rans-
 t that
 heme
 erate
mance
  ieval
 et al.




                                                                       t
          Figure 7: Synthetic 2D examples on a Euclidean domain. The
Discrete Formulation
                                     s
Centered grid formulation (d = 1):
      min J(x) + ◆C (x)
    x2RGc ⇥2
               P
     J(x) =        i2Gc   j(xi )                        t
                                         Centered grid Gc
Discrete Formulation
                                     s
Centered grid formulation (d = 1):
      min J(x) + ◆C (x)
    x2RGc ⇥2
               P
     J(x) =        i2Gc   j(xi )                        t
Staggered grid formulation :             Centered grid Gc
     min 2 J(I(x)) + ◆C (x)          s
      1
  x2RGst ⇥RGst




                                                          t
                                         Staggered grid
                                              1    2
                                             Gst Gst
Discrete Formulation
                                                   s
Centered grid formulation (d = 1):
      min J(x) + ◆C (x)
    x2RGc ⇥2
               P
     J(x) =        i2Gc   j(xi )                                      t
Staggered grid formulation :                           Centered grid Gc
     min 2 J(I(x)) + ◆C (x)                        s
      1
  x2RGst ⇥RGst

Interpolation operator:
                      1             2
                     Gst           Gst
         1     2
  I = (I , I ) : R         ⇥R             ! RG c
                                                                     t
 2I1 (m)i,j = mi+ 1 ,j + mi
                  2
                                         1
                                         2 ,j
                                        Staggered grid
 ! Projection on div(x) = 0 using FFTs.      1    2
                                            Gst Gst
SOCP Formulation
                                              P
         min      J(x) + ◆C (x)      J(x) =       i2Gc   j(xi )
       x2RGc ⇥d
                       X
()        min              ri     s.t.   8 i 2 Gc , (mi , ⇢i , ri ) 2 K
     x2RGc ⇥d ,r2RGc
                       i

(Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2  ||m||2 6 ⇢r
                             ˜ ˜ ˜               ˜      ˜˜
SOCP Formulation
                                              P
         min      J(x) + ◆C (x)      J(x) =       i2Gc   j(xi )
       x2RGc ⇥d
                       X
()        min              ri     s.t.   8 i 2 Gc , (mi , ⇢i , ri ) 2 K
     x2RGc ⇥d ,r2RGc
                       i

(Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2  ||m||2 6 ⇢r
                             ˜ ˜ ˜               ˜      ˜˜

Second order cone program:
    ! Use interior point methods (e.g. MOSEK software).
         Linear convergence with iteration #.
         Poor scaling with dimension |Gc |.
  E cient for medium scale problems (N ⇠ 104 ).
1
        Example:         Regularization
Inverse problem:   measurements       y = x0 + w
             x0                   y
1
          Example:           Regularization
 Inverse problem:    measurements        y = x0 + w
              x0                     y                  x?
                                         argmin



Regularized inversion:   x? 2 argmin 1 ||y
                                      2      x||2 + R(x)
                               x2R N
                                     Data fidelity Regularity
1
          Example:           Regularization
 Inverse problem:    measurements        y = x0 + w
              x0                     y                 x?
                                         argmin



Regularized inversion:  x? 2 argmin 1 ||y
                                     2      x||2 + R(x)
                              x2R N
                                    Data fidelity Regularity
                           P
Total Variation:    R(x) = i ||(rx)i ||
1
           Example:              Regularization
 Inverse problem:        measurements             y = x0 + w
              x0                              y                        x?
                                                  argmin



Regularized inversion:   x? 2 argmin 1 ||y
                                      2      x||2 + R(x)
                               x2R N
                                     Data fidelity Regularity
                            P
Total Variation:     R(x) = i ||(rx)i ||

 1
                         P                                 ⇤
` sparsity:    R(x) =        i   |xi |
     Images are sparse
     in wavelet bases.                                                  ⇤
                                         Image f =   x     Coe↵. x =        f
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

           Problem:   min G(x)
                      x H
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

             Problem:    min G(x)
                         x H

Class of functions:                              x         y

  Convex: G(tx + (1     t)y)   tG(x) + (1   t)G(y)   t   [0, 1]
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

             Problem:    min G(x)
                         x H

Class of functions:                                x         y

  Convex: G(tx + (1     t)y)   tG(x) + (1     t)G(y)   t   [0, 1]

  Lower semi-continuous:       lim inf G(x)   G(x0 )
                               x   x0

  Proper: {x ⇥ H  G(x) ⇤= + } = ⌅
                               ⇤
Convex Optimization
Setting: G : H     R ⇤ {+⇥}
     H: Hilbert space. Here: H = RN .

             Problem:         min G(x)
                              x H

Class of functions:                                     x         y

  Convex: G(tx + (1       t)y)      tG(x) + (1     t)G(y)   t   [0, 1]

  Lower semi-continuous:            lim inf G(x)   G(x0 )
                                    x   x0

  Proper: {x ⇥ H  G(x) ⇤= + } = ⌅
                               ⇤

                                    0 if x ⇥ C,
Indicator:            C (x)   =
                                    +    otherwise.
   (C closed and convex)
Sub-differential

Sub-di erential:
      G(x) = {u ⇥ H  ⇤ z, G(z)   G(x) + ⌅u, z   x⇧}
                                             G(x) = |x|



                                             G(0) = [ 1, 1]
Sub-differential

Sub-di erential:
      G(x) = {u ⇥ H  ⇤ z, G(z)       G(x) + ⌅u, z   x⇧}
                                                 G(x) = |x|
Smooth functions:
     If F is C 1 , F (x) = { F (x)}

                                                 G(0) = [ 1, 1]
Sub-differential

Sub-di erential:
        G(x) = {u ⇥ H  ⇤ z, G(z)     G(x) + ⌅u, z   x⇧}
                                                 G(x) = |x|
Smooth functions:
     If F is C 1 , F (x) = { F (x)}

                                                 G(0) = [ 1, 1]
First-order conditions:
    x      argmin G(x)        0     G(x )
            x H
Sub-differential

Sub-di erential:
        G(x) = {u ⇥ H  ⇤ z, G(z)          G(x) + ⌅u, z   x⇧}
                                                      G(x) = |x|
Smooth functions:
     If F is C 1 , F (x) = { F (x)}

                                                      G(0) = [ 1, 1]
First-order conditions:
    x      argmin G(x)            0       G(x )
             x H                                                U (x)

                                                                    x
Monotone operator:        U (x) = G(x)
        (u, v)   U (x)   U (y),       y   x, v    u   0
Prox and Subdifferential
                   1
Prox G (x) = argmin ||x   z||2 + G(z)
                z  2
Prox and Subdifferential
                           1
        Prox G (x) = argmin ||x   z||2 + G(z)
                        z  2
Resolvant of G:
        z = Prox   G (x)          0   z   x + ⇥G(z)
        x   (Id + ⇥G)(z)
Prox and Subdifferential
                           1
        Prox G (x) = argmin ||x          z||2 + G(z)
                        z  2
Resolvant of G:
          z = Prox   G (x)               0     z   x + ⇥G(z)
          x    (Id + ⇥G)(z)              z = (Id + ⇥G)   1
                                                             (x)
Inverse of a set-valued mapping:
          where x    U (y)       y   U   1
                                             (x)
   Prox   G   = (Id + ⇥G)    1
                                 is a single-valued mapping
Prox and Subdifferential
                           1
        Prox G (x) = argmin ||x             z||2 + G(z)
                        z  2
Resolvant of G:
          z = Prox   G (x)                  0     z   x + ⇥G(z)
          x    (Id + ⇥G)(z)                 z = (Id + ⇥G)       1
                                                                    (x)
Inverse of a set-valued mapping:
          where x     U (y)       y     U   1
                                                (x)
   Prox   G   = (Id + ⇥G)     1
                                  is a single-valued mapping
Fix point:     x     argmin G(x)
                       x
               0     G(x )              x       (Id + ⇥G)(x )
               x⇥ = (Id + ⇥G)      1
                                       (x⇥ ) = Prox   G (x⇥ )
Proximal Calculus
Separability:    G(x) = G1 (x1 ) + . . . + Gn (xn )
     ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
Proximal Calculus
Separability:    G(x) = G1 (x1 ) + . . . + Gn (xn )
    ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
                                     1
Quadratic functionals:  G(x) = || x y||2
                                     2
  Prox G = (Id +       ) 1
            =     (Id +       )   1
Proximal Calculus
Separability:     G(x) = G1 (x1 ) + . . . + Gn (xn )
    ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
                                     1
Quadratic functionals:  G(x) = || x y||2
                                     2
  Prox G = (Id +       ) 1
            =      (Id +       )   1


Composition by tight frame: A A = Id
      ProxG     A (x)   =A   ProxG A + Id        A     A
Proximal Calculus
Separability:       G(x) = G1 (x1 ) + . . . + Gn (xn )
    ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
                                     1
Quadratic functionals:  G(x) = || x y||2
                                     2
  Prox G = (Id +       ) 1
              =      (Id +              )   1


Composition by tight frame: A A = Id
      ProxG       A (x)   =A      ProxG A + Id     A         A
                                                                     x
Indicators:        G(x) =       C (x)
                                                         C
    Prox   G (x)   = ProjC (x)                               ProjC (x)
                   = argmin ||x             z||
                          z C
Prox of Sparse Regularizers
                   1
Prox G (x) = argmin ||x   z||2 + G(z)
                z  2
Prox of Sparse Regularizers
                            1
         Prox G (x) = argmin ||x   z||2 + G(z)
                         z  2
G(x) = ||x||1 =        |xi |            12              log(1 + x2 )
                   i                    10
                                                         |x| ||x||0
                                         8


                                         6


                                         4


                                         2




G(x) = ||x||0 = | {i  xi = 0} |         0


                                        −2
                                                                               G(x)
                                        −10   −8   −6    −4   −2   0   2   4    6   8   10




G(x) =        log(1 + |xi |2 )
          i
Prox of Sparse Regularizers
                            1
         Prox G (x) = argmin ||x                 z||2 + G(z)
                         z  2
G(x) = ||x||1 =              |xi |                      12                     log(1 + x2 )
                         i                              10
                                                                                |x| ||x||0
     Prox       G (x)i   = max 0, 1              xi      8



                                         |xi |           6


                                                         4


                                                         2




G(x) = ||x||0 = | {i  xi = 0} |                         0


                                                        −2
                                                                                                                G(x)
                                 xi if |xi |      2 ,   −10     −8        −6        −4        −2    0   2   4       6       8        10




     Prox       G (x)i   =
                                                         10




                                 0 otherwise.
                                                          8


                                                          6


                                                          4


                                                          2




G(x) =           log(1 + |xi |2 )                       −2
                                                          0




            i                                           −4




            3rd order polynomial root.
                                                        −6


                                                        −8
                                                                                                    ProxG (x)
                                                        −10
                                                              −10    −8        −6        −4    −2   0   2   4   6       8       10
Legendre-Fenchel Duality
Legendre-Fenchel transform:
 G (u) =     sup      u, x   G(x)                     eu
           x dom(G)                 G(x)        S lop
                                    G (u)

                                            x
Legendre-Fenchel Duality
Legendre-Fenchel transform:
 G (u) =     sup      u, x   G(x)                     eu
           x dom(G)                 G(x)        S lop
                                    G (u)
Example: quadratic functional
          1                                 x
   G(x) = Ax, x + x, b
          2
           1
   G (u) = u b, A 1 (u b)
           2
Legendre-Fenchel Duality
Legendre-Fenchel transform:
 G (u) =     sup      u, x   G(x)                     eu
           x dom(G)                 G(x)        S lop
                                    G (u)
Example: quadratic functional
          1                                 x
   G(x) = Ax, x + x, b
          2
           1
   G (u) = u b, A 1 (u b)
           2

Moreau’s identity:
      Prox   G   (x) = x     ProxG/ (x/ )
       G simple              G simple
Indicator and Homogeneous Functionals
Positively 1-homogeneous functional:      G( x) = | |G(x)
       Example: norm       G(x) = ||x||

Duality:   G (x) =   G (·) 1 (x)       G (y) = min     x, y
                                              G(x) 1
Indicator and Homogeneous Functionals
Positively 1-homogeneous functional:          G( x) = | |G(x)
        Example: norm          G(x) = ||x||

Duality:      G (x) =   G (·) 1 (x)      G (y) = min       x, y
                                                  G(x) 1
 p
     norms:    G(x) = ||x||p          1 1
                                       + =1       1   p, q    +
              G (x) = ||x||q          p q
Indicator and Homogeneous Functionals
Positively 1-homogeneous functional:                 G( x) = | |G(x)
         Example: norm               G(x) = ||x||

Duality:       G (x) =       G (·) 1 (x)        G (y) = min       x, y
                                                         G(x) 1
 p
     norms:          G(x) = ||x||p          1 1
                                             + =1        1   p, q    +
                  G (x) = ||x||q            p q

Example: Proximal operator of                  norm
     Prox    ||·||    = Id     Proj||·||1

     Proj||·||1       (x)i = max 0, 1               xi
                                            |xi |
         for a well-chosen ⇥ = ⇥ (x, )
Prox of the J Functional
            X                           ||m||2
                                          ˜
J(m, ⇢) =       j(mi , ⇢i )   j(m, ⇢) =
                                ˜ ˜              for ⇢ > 0
                                                     ˜
            i
                                          ⇢˜
Prox of the J Functional
            X                                 ||m||2
                                                ˜
J(m, ⇢) =         j(mi , ⇢i )       j(m, ⇢) =
                                      ˜ ˜              for ⇢ > 0
                                                           ˜
             i
                                                ⇢˜
Prox   J (m, ⇢)   = (Prox j (mi , ⇢i ))i
Prox of the J Functional
             X                                 ||m||2
                                                 ˜
J(m, ⇢) =         j(mi , ⇢i )        j(m, ⇢) =
                                       ˜ ˜              for ⇢ > 0
                                                            ˜
              i
                                                 ⇢˜
Prox   J (m, ⇢)   = (Prox j (mi , ⇢i ))i
 j ⇤ = ◆C    where      C = (a, b) 2 R2 ⇥ R  2||a||2 + b 6 0

Prox j (˜) = x
        x    ˜         ProjC (˜/ )
                              x            where x = (m, ⇢)
                                                 ˜    ˜ ˜
Prox of the J Functional
             X                                 ||m||2
                                                 ˜
J(m, ⇢) =         j(mi , ⇢i )        j(m, ⇢) =
                                       ˜ ˜                  for ⇢ > 0
                                                                ˜
              i
                                                 ⇢˜
Prox   J (m, ⇢)   = (Prox j (mi , ⇢i ))i
 j ⇤ = ◆C    where      C = (a, b) 2 R2 ⇥ R  2||a||2 + b 6 0

Prox j (˜) = x
        x    ˜         ProjC (˜/ )
                              x            where x = (m, ⇢)
                                                 ˜    ˜ ˜

                                      ⇢
                                           (m? , ⇢? ) if ⇢? > 0
 Proposition:        Prox (m, ⇢) =
                           ˜ ˜
                                           (0, 0) otherwise.
             ⇢? m
                ˜
          ?
   where m = ?                   and ⇢? is the largest root of
            ⇢ +2
  X 3 + (4         ⇢)X 2 + 4 (
                   ˜                 ⇢)X
                                     ˜        ||m||2
                                                ˜       4   2
                                                                ⇢=0
                                                                ˜
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
Gradient and Proximal Descents
Gradient descent:   x( +1) = x( )   G(x( ) )       [explicit]
           G is C 1 and G is L-Lipschitz

     Theorem:   If 0 <   < 2/L, x(   )
                                         x a solution.
Gradient and Proximal Descents
Gradient descent:   x( +1) = x( )   G(x( ) )                      [explicit]
           G is C 1 and G is L-Lipschitz

     Theorem:   If 0 <     < 2/L, x(        )
                                                x a solution.

Sub-gradient descent: x(   +1)
                                 = x(   )
                                                v( ) ,   v(   )
                                                                    G(x( ) )

     Theorem:   If       1/⇥, x(   )
                                            x a solution.

         Problem: slow.
Gradient and Proximal Descents
Gradient descent:   x( +1) = x( )   G(x( ) )                            [explicit]
           G is C 1 and G is L-Lipschitz

     Theorem:    If 0 <     < 2/L, x(           )
                                                      x a solution.

Sub-gradient descent: x(    +1)
                                  = x(      )
                                                      v( ) ,   v(   )
                                                                          G(x( ) )

     Theorem:    If       1/⇥, x(   )
                                                x a solution.

         Problem: slow.
Proximal-point algorithm: x(⇥+1) = Prox                  G (x(⇥) ) [implicit]

     Theorem:    If       c > 0, x(     )
                                                    x a solution.

          Prox   G    hard to compute.                  [Rockafellar, 70]
Proximal Splitting Methods
           Solve     min E(x)
                     x H
Problem:      Prox   E   is not available.
Proximal Splitting Methods
           Solve     min E(x)
                     x H
Problem:      Prox   E   is not available.
Splitting:    E(x) = F (x) +            Gi (x)
                                    i
                         Smooth         Simple
Proximal Splitting Methods
           Solve     min E(x)
                     x H
Problem:      Prox   E   is not available.
Splitting:    E(x) = F (x) +            Gi (x)
                                    i
                         Smooth         Simple
                                         F (x)
Iterative algorithms using:
                                        Prox Gi (x)
                               solves
   Forward-Backward:                         F + G
   Douglas-Rachford:                                  Gi
   Primal-Dual:                                       Gi A
   Generalized FB:                           F+       Gi
Smooth + Simple Splitting
Inverse problem:    measurements      y = Kf0 + w
    f0                Kf0
                K                     K : RN   RP ,   P   N


Model: f0 =     x0 sparse in dictionary    .
Sparse recovery: f =    x where x solves
              min F (x) + G(x)
             x RN
                Smooth Simple
                        1
Data fidelity:   F (x) = ||y     x||2           =K ⇥
                        2
Regularization: G(x) = ||x||1 =    |xi |
                                  i
Forward-Backward
Fix point equation:
   x    argmin F (x) + G(x)        0       F (x ) + G(x )
           x
                      (x      F (x ))      x + ⇥G(x )
                       x⇥ = Prox   G (x⇥       F (x⇥ ))
Forward-Backward
Fix point equation:
   x    argmin F (x) + G(x)           0       F (x ) + G(x )
           x
                        (x       F (x ))      x + ⇥G(x )
                        x⇥ = Prox     G (x⇥       F (x⇥ ))

Forward-backward:     x(⇥+1) = Prox   G   x(⇥)       F (x(⇥) )
Forward-Backward
Fix point equation:
   x    argmin F (x) + G(x)           0         F (x ) + G(x )
           x
                        (x       F (x ))         x + ⇥G(x )
                         x⇥ = Prox    G (x⇥          F (x⇥ ))

Forward-backward:     x(⇥+1) = Prox   G       x(⇥)      F (x(⇥) )

Projected gradient descent:    G=         C
Forward-Backward
Fix point equation:
   x    argmin F (x) + G(x)                    0         F (x ) + G(x )
              x
                             (x            F (x ))        x + ⇥G(x )
                             x⇥ = Prox        G (x⇥           F (x⇥ ))

Forward-backward:          x(⇥+1) = Prox       G       x(⇥)      F (x(⇥) )

Projected gradient descent:               G=       C


        Theorem:           Let       F be L-Lipschitz.
         If       < 2/L,    x(   )
                                      x      a solution of ( )

                                          [Passty 79, Gabay, 83]
Example: L1 Regularization

    1
 min || x    y||2 + ||x||1             min F (x) + G(x)
  x 2                                    x

            1
     F (x) = || x      y||2
            2
             F (x) =        ( x   y)                 L = ||   ||

     G(x) = ||x||1
                                               ⇥
            Prox   G (x)i   = max 0, 1                 xi
                                             |xi |


Forward-backward                  Iterative soft thresholding
Convergence Speed

min E(x) = F (x) + G(x)
 x

            F is L-Lipschitz.
           G is simple.

Theorem:    If L > 0, FB iterates x(   )
                                           satisfies
        E(x( ) )   E(x )    C/


      C degrades with L         0.
Multi-steps Accelerations
Beck-Teboule accelerated FB:        t(0) = 1
                     ✓                     ◆
      (`+1)             (`)  1
    x       = Prox1/L y        rF (y (`) )
                             L
             1+     1 + 4(t( ) )2
    t( +1) =
                      2()
                       t      1 (
    y ( +1)
            =x( +1)
                    + ( +1) (x         +1)
                                             x( ) )
                        t

   (see also Nesterov method)

                                                      C
 Theorem:    If L > 0,   E(x ( )
                                   )    E(x )


Complexity theory: optimal in a worse-case sense.
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
Douglas Rachford Scheme

                  min G1 (x) + G2 (x)             ( )
                   x
                       Simple         Simple
Douglas-Rachford iterations:

  z (⇥+1) = 1          z (⇥) +       RProx   G2    RProx      G1 (z (⇥) )
                  2              2
  x(⇥+1) = Prox   G2 (z (⇥+1) )

Reflexive prox:    RProx     G (x)    = 2Prox      G (x)   x
Douglas Rachford Scheme

                  min G1 (x) + G2 (x)                 ( )
                   x
                           Simple         Simple
Douglas-Rachford iterations:

  z (⇥+1) = 1              z (⇥) +       RProx   G2    RProx      G1 (z (⇥) )
                  2                  2
  x(⇥+1) = Prox   G2 (z (⇥+1) )

Reflexive prox:    RProx           G (x)   = 2Prox     G (x)   x

       Theorem:        If 0 <        < 2 and ⇥ > 0,
                 x(    )
                              x            a solution of ( )
                                            [Lions, Mercier, 79]
DR Fix Point Equation

min G1 (x) + G2 (x)             0    (G1 + G2 )(x)
 x

     z, z   x    ⇥( G1 )(x) and x         z       ⇥( G2 )(x)

 x = Prox       G1 (z)   and   (2x   z)       x    ⇥( G2 )(x)
DR Fix Point Equation

min G1 (x) + G2 (x)                        0       (G1 + G2 )(x)
 x

     z, z     x    ⇥( G1 )(x) and x                      z       ⇥( G2 )(x)

 x = Prox         G1 (z)        and       (2x     z)         x     ⇥( G2 )(x)
            x = Prox    G2 (2x        z) = Prox          G2      RProx    G1 (z)

            z = 2Prox      G2    RProx         G1 (y)        (2x     z)

            z = 2Prox      G2    RProx         G1 (z)        RProx    G1 (z)

        z = RProx          G2     RProx         G1 (z)

            z= 1            z+            RProx    G2        RProx    G1 (z)
                       2              2
Example: Optimal Transport on Centered Grid
                                     s
 min       J(x) + ◆C (x)
x2RGc ⇥2

   C = {x = (m, ⇢)  Ax = b}    I0                 I1
   b = (0, ⇢0 , ⇢1 )
                                                    t
  A(x) = (div(x), ⇢I0 , ⇢I1 )
                                     Centered grid Gc
Example: Optimal Transport on Centered Grid
                                              s
 min       J(x) + ◆C (x)
x2RGc ⇥2

   C = {x = (m, ⇢)  Ax = b}             I0                 I1
   b = (0, ⇢0 , ⇢1 )
                                                             t
  A(x) = (div(x), ⇢I0 , ⇢I1 )
                                              Centered grid Gc
Prox   J   : cubic root (closed form).
Example: Optimal Transport on Centered Grid
                                                s
 min       J(x) + ◆C (x)
x2RGc ⇥2

   C = {x = (m, ⇢)  Ax = b}               I0                  I1
   b = (0, ⇢0 , ⇢1 )
                                                               t
  A(x) = (div(x), ⇢I0 , ⇢I1 )
                                                Centered grid Gc
Prox   J   : cubic root (closed form).
Prox◆C = ProjC = (Id          A⇤    1
                                        A) + A⇤     1
                                                        y
   1
       = (AA⇤ )   1
                      : solving a Poisson equation with b.c.
Example: Optimal Transport on Centered Grid
                                                    s
     min       J(x) + ◆C (x)
   x2RGc ⇥2

      C = {x = (m, ⇢)  Ax = b}                I0                  I1
      b = (0, ⇢0 , ⇢1 )
                                                                   t
      A(x) = (div(x), ⇢I0 , ⇢I1 )
                                                    Centered grid Gc
   Prox    J   : cubic root (closed form).
   Prox◆C = ProjC = (Id           A⇤    1
                                            A) + A⇤     1
                                                            y
       1
           = (AA⇤ )   1
                          : solving a Poisson equation with b.c.

Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000]
Example: Optimal Transport on Centered Grid
                                                    s
     min       J(x) + ◆C (x)
   x2RGc ⇥2

      C = {x = (m, ⇢)  Ax = b}                I0                  I1
      b = (0, ⇢0 , ⇢1 )
                                                                   t
      A(x) = (div(x), ⇢I0 , ⇢I1 )
                                                    Centered grid Gc
   Prox    J   : cubic root (closed form).
   Prox◆C = ProjC = (Id           A⇤    1
                                            A) + A⇤     1
                                                            y
       1
           = (AA⇤ )   1
                          : solving a Poisson equation with b.c.

Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000]

     ! Advantage: relaxation parameter ↵ 2]0, 1[.
Example: Constrained L1
                  min ||x||1                min G1 (x) + G2 (x)
                  x=y                        x

G1 (x) = iC (x),         C = {x  x = y}
   Prox   G1 (x) = ProjC (x) = x +
                                                 ⇥
                                                     (   ⇥
                                                             )   1
                                                                     (y      x)

G2 (x) = ||x||1       Prox     G2 (x)   =    max 0, 1                        xi
                                                                     |xi |        i
          e⇥cient if            easy to invert.
Example: Constrained L1
                  min ||x||1                min G1 (x) + G2 (x)
                  x=y                        x

G1 (x) = iC (x),         C = {x  x = y}
   Prox   G1 (x) = ProjC (x) = x +
                                                 ⇥
                                                     (       ⇥
                                                                 )     1
                                                                           (y            x)

G2 (x) = ||x||1       Prox     G2 (x)   =    max 0, 1                                    xi
                                                                           |xi |                i
          e⇥cient if            easy to invert.                      log10 (||x( ) ||1          ||x ||1 )
                                                         1

Example: compressed sensing                          −1
                                                         0



       R100   400
                      Gaussian matrix                −2
                                                     −3      = 0.01
  y = x0                ||x0 ||0 = 17                −4      =1
                                                     −5
                                                             = 10
                                                                 50        100     150        200   250
Auxiliary Variables with DR
min G1 (x) + G2 A(x)              Linear map A : E   H.
  x
 min G(z) +    C (z)              G1 , G2 simple.
z⇥H E

      G(x, y) = G1 (x) + G2 (y)
      C = {(x, y) ⇥ H   E  Ax = y}
Auxiliary Variables with DR
        min G1 (x) + G2 A(x)                Linear map A : E   H.
          x
        min G(z) +      C (z)               G1 , G2 simple.
       z⇥H E

              G(x, y) = G1 (x) + G2 (y)
              C = {(x, y) ⇥ H       E  Ax = y}

Prox   G (x, y)   = (Prox   G1 (x), Prox G2 (y))

Prox C (x, y) = (x + A y , y
                       ˜          y ) = (˜, A˜)
                                  ˜      x x

                    y = (Id + AA )
                    ˜                 1
                                          (Ax   y)
       where
                   x = (Id + A A)
                   ˜                  1
                                          (A y + x)
       e cient if Id + AA or Id + A A easy to invert.
Example: TV Regularization
          1                                   ||u||1 =        ||ui ||
      min ||Kf y||2 + ||⇥f ||1
       f  2                                              i
      min G1 (f ) + G2 (f )
        x

G1 (u) = ||u||1      Prox   G1 (u)i    = max 0, 1                      ui
                                                         ||ui ||
         1
G2 (f ) = ||Kf     y||2         Prox        = (Id + K K)           1
                                                                       K
         2                             G2


C = (f, u) ⇥ RN       RN    2
                                 u = ⇤f
                         ˜ ˜
        Prox C (f, u) = (f , f )
Example: TV Regularization
          1                                    ||u||1 =        ||ui ||
      min ||Kf y||2 + ||⇥f ||1
       f  2                                               i
      min G1 (f ) + G2 (f )
        x

G1 (u) = ||u||1       Prox   G1 (u)i    = max 0, 1                      ui
                                                          ||ui ||
         1
G2 (f ) = ||Kf      y||2         Prox        = (Id + K K)           1
                                                                        K
         2                              G2


C = (f, u) ⇥ RN        RN    2
                                  u = ⇤f
                         ˜ ˜
        Prox C (f, u) = (f , f )
Compute the solution of:           (Id +       ˜
                                              )f =   div(u) + f
            O(N log(N )) operations using FFT.
Example: TV Regularization




  Orignal f0       y = f0 + w     Recovery f




y = Kx0                                Iteration
Alternating Direction Method of Multipliers
min F (x) + G A(x)     (?)    ()    min F (x) + G(y)
 x                                 x,y=Ax
     A : RN ! RP injective.
Alternating Direction Method of Multipliers
 min F (x) + G A(x)     (?)    ()      min F (x) + G(y)
   x                                  x,y=Ax
     A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y   Axi
             x,y   u
Alternating Direction Method of Multipliers
 min F (x) + G A(x)     (?)    ()      min F (x) + G(y)
   x                                  x,y=Ax
     A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y       Axi
             x,y   u
Augmented: min max L (x, y, u) = L(x, y, u) +       ||y   Ax||2
             x,y   u                            2
Alternating Direction Method of Multipliers
 min F (x) + G A(x)            (?)     ()        min F (x) + G(y)
   x                                            x,y=Ax
     A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y                  Axi
                x,y   u
Augmented: min max L (x, y, u) = L(x, y, u) +                  ||y   Ax||2
                x,y   u                                    2
                  (`+1)
                x         = argminx L (x, y (`) , u(`) )
         ADMM




                y (`+1) = argminy L (x(`+1) , y, u(`) )
                u(`+1) = u(`) + (y (`+1)       Ax(`+1) )
Alternating Direction Method of Multipliers
 min F (x) + G A(x)              (?)       ()      min F (x) + G(y)
   x                                              x,y=Ax
     A : RN ! RP injective.
Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y                    Axi
                  x,y   u
Augmented: min max L (x, y, u) = L(x, y, u) +                    ||y   Ax||2
                  x,y   u                                    2
                    (`+1)
                  x         = argminx L (x, y (`) , u(`) )
           ADMM




                  y (`+1) = argminy L (x(`+1) , y, u(`) )
                  u(`+1) = u(`) + (y (`+1)       Ax(`+1) )

       Theorem: If          > 0, x(    )
                                           x    a solution of ( )

                  [Gabay, Mercier, Glowinski, Marrocco, 76]
ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
         A            1
     Prox F = argmin ||Ax z||2 + F (x)
                 x    2
ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
         A            1
     Prox F = argmin ||Ax z||2 + F (x)
                 x    2

  Proposition:   ProxAF = A+ Id      ProxF ⇤   A⇤ /   (·/ )
ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
         A            1
     Prox F = argmin ||Ax z||2 + F (x)
                 x    2

  Proposition:      ProxAF = A+ Id          ProxF ⇤    A⇤ /   (·/ )


               x(`+1) = ProxA (y (`)
                            F/            u(`) )
      ADMM




               y (`+1) = ProxG/ (Ax(`+1) + u(`) )

               u(`+1) = u(`) + (y (`+1)    Ax(`+1) )
ADMM with Proximal Operators
Proximal mapping for metric A: (A is injective)
         A            1
     Prox F = argmin ||Ax z||2 + F (x)
                 x    2

  Proposition:      ProxAF = A+ Id          ProxF ⇤    A⇤ /   (·/ )


               x(`+1) = ProxA (y (`)
                            F/            u(`) )
      ADMM




               y (`+1) = ProxG/ (Ax(`+1) + u(`) )

               u(`+1) = u(`) + (y (`+1)    Ax(`+1) )

     ! If G A is simple: use DR.
     ! If F ⇤ A⇤ is simple: use ADMM.
ADMM vs. DR
Fenchel-Rockafellar duality:
    min F (x) + G A(x)         !   min F ⇤ ( A⇤ u) + G⇤ (u)
      x                             u
Important: no bijection between u and x.
ADMM vs. DR
Fenchel-Rockafellar duality:
    min F (x) + G A(x)         !    min F ⇤ ( A⇤ u) + G⇤ (u)
      x                              u
Important: no bijection between u and x.

 Proposition:   DR applied to F ⇤     A⇤ + G⇤ is ADMM.

                                    [Eckstein, Bertsekas, 92]
ADMM vs. DR
Fenchel-Rockafellar duality:
    min F (x) + G A(x)          !    min F ⇤ ( A⇤ u) + G⇤ (u)
      x                               u
Important: no bijection between u and x.

 Proposition:   DR applied to F ⇤         A⇤ + G⇤ is ADMM.

                                     [Eckstein, Bertsekas, 92]
DR iterations (when   ↵ = 1):
     (`+1)   1 (`)    1
   z       = z +        RProx   F⇤   A⇤    RProx   G⇤   (z (`) )
             2        2
ADMM vs. DR
Fenchel-Rockafellar duality:
    min F (x) + G A(x)         !    min F ⇤ ( A⇤ u) + G⇤ (u)
      x                              u
Important: no bijection between u and x.

 Proposition:   DR applied to F ⇤     A⇤ + G⇤ is ADMM.

                                    [Eckstein, Bertsekas, 92]
DR iterations (when ↵ = 1):
     (`+1)   1 (`) 1
   z       = z + RProx F ⇤ A⇤ RProx G⇤ (z (`) )
             2      2
The iterates of ADMM are recovered using:
                                   (`)   1 (`)
                                  y = (z          u(`) )
  x(`+1) = ProxA (y (`) u(`) )
                 F/
                                  u(`) = Prox G⇤ (z (`) )
More than 2 Functionals

      min G1 (x) + . . . + Gk (x)                 each Fi is simple
         x

    min G(x1 , . . . , xk ) +   C (x1 , . . . , xk )
     x

G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )

C = (x1 , . . . , xk )   Hk  x1 = . . . = xk
More than 2 Functionals

            min G1 (x) + . . . + Gk (x)                    each Fi is simple
             x

        min G(x1 , . . . , xk ) +        C (x1 , . . . , xk )
         x

   G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk )

   C = (x1 , . . . , xk )          Hk  x1 = . . . = xk


G and   C   are simple:

 Prox   G (x1 , . . . , xk )    = (Prox     Gi (xi ))i
                                                                 1
 Prox   ⇥C (x1 , . . . , xk )   = (˜, . . . , x)
                                   x          ˜        where x =
                                                             ˜              xi
                                                                 k      i
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
GFB Splitting
                                             n
                        min F (x) +               Gi (x)   ( )
                    x RN
                                            i=1
i = 1, . . . , n,             Smooth         Simple
     (⇥+1)    (⇥)                                    (⇥)
    zi     = zi +         Proxn        G   (2x
                                             (⇥)
                                                    zi       F (x(⇥) )) x(⇥)
                  n
              1                ( +1)
    x( +1)
             =                zi
                    n   i=1




                                       [Raguet, Fadili, Peyr´ 2012]
                                                            e
GFB Splitting
                                               n
                        min F (x) +                  Gi (x)   ( )
                    x RN
                                               i=1
i = 1, . . . , n,             Smooth            Simple
     (⇥+1)    (⇥)                                       (⇥)
    zi     = zi +         Proxn        G   (2x  (⇥)
                                                       zi       F (x(⇥) )) x(⇥)
                  n
              1                ( +1)
    x( +1)
              =               zi
                    n   i=1


        Theorem:                Let        F be L-Lipschitz.
        If    < 2/L,          x(   )
                                           x     a solution of ( )
                                       [Raguet, Fadili, Peyr´ 2012]
                                                            e
GFB Splitting
                                               n
                        min F (x) +                  Gi (x)   ( )
                    x RN
                                               i=1
i = 1, . . . , n,             Smooth            Simple
     (⇥+1)    (⇥)                                       (⇥)
    zi     = zi +         Proxn        G   (2x  (⇥)
                                                       zi       F (x(⇥) )) x(⇥)
                  n
              1                ( +1)
    x( +1)
              =               zi
                    n   i=1


        Theorem:                Let        F be L-Lipschitz.
        If    < 2/L,          x(   )
                                           x     a solution of ( )
                                     [Raguet, Fadili, Peyr´ 2012]
                                                          e
             n=1               Forward-backward.
             F =0              Douglas-Rachford.
GFB Fix Point
x   argmin F (x) +   i   Gi (x)         0      F (x ) +   i   Gi (x )
    x RN
            yi   Gi (x ),         F (x ) +   i yi   =0
GFB Fix Point
x   argmin F (x) +       i   Gi (x)         0       F (x ) +       i   Gi (x )
    x RN
             yi       Gi (x ),        F (x ) +    i yi   =0

                         1
           (zi )n ,
                i=1    i, x            zi        F (x )        ⇥Gi (x )
                         n
           x =    1
                  n    i zi           (use zi = x             F (x )    N yi )
GFB Fix Point
x   argmin F (x) +         i   Gi (x)          0            F (x ) +        i   Gi (x )
    x RN
             yi         Gi (x ),        F (x ) +        i yi     =0

                           1
           (zi )n ,
                i=1      i, x            zi          F (x )            ⇥Gi (x )
                           n
           x =     1
                   n     i zi           (use zi = x                   F (x )      N yi )

                  (2x      zi           F (x ))         x        n ⇥Gi (x )
                  x⇥ = Proxn       Gi (2x⇥         zi            F (x⇥ ))
                  zi = zi + Proxn         G   (2x⇥          zi         F (x⇥ ))     x⇥
GFB Fix Point
x   argmin F (x) +         i   Gi (x)          0            F (x ) +        i   Gi (x )
    x RN
             yi         Gi (x ),        F (x ) +        i yi     =0

                           1
           (zi )n ,
                i=1      i, x            zi          F (x )            ⇥Gi (x )
                           n
           x =     1
                   n     i zi           (use zi = x                   F (x )      N yi )

                  (2x      zi           F (x ))         x        n ⇥Gi (x )
                  x⇥ = Proxn       Gi (2x⇥         zi            F (x⇥ ))
                  zi = zi + Proxn         G   (2x⇥          zi         F (x⇥ ))     x⇥

      +                    Fix point equation on (x , z1 , . . . , zn ).
Block Regularization
        1       2
                     block sparsity: G(x) =                 ||x[b] ||,          ||x[b] ||2 =         x2
                                                                                                      m
                                                      b B                                      m b

iments                            Towards More Complex Penalization
            (2)                 Bk
2
    +       ` 1 `2
                      4
                      k=1   x   1,2

                                                                                          b B1       i b xi
                                 ⇥ x⇥⇥1 =   i ⇥xi ⇥         b B          i   b xi2               +
                                                                                                     i b xi
              N: 256
                                                                                          b B2

                                            b     B




    Image f =            x Coe cients x.
Block Regularization
     1        2
                  block sparsity: G(x) =                    ||x[b] ||,      ||x[b] ||2 =          x2
                                                                                                   m
                                                      b B                                  m b

iments Towards More Complex Penalization
 Non-overlapping decomposition: B = B ... B
                  Towards More Complex Penalization
                Towards More Complex Penalization
                       n
                                                                     1               n

2     G(x) =4 x iBk
      (2)
    + ` ` k=1 G 1,2
                 (x)                        Gi (x) =           ||x[b] ||,
         1    2       i=1                              b Bi

                                                                                  b b 1b1 B1 i b xiixb xi
                                                                                                    22
                                                                                    BB
                            ⇥ x⇥x⇥x⇥⇥1 =i ⇥x⇥x⇥xi ⇥
                                  ⇥=                                                     ++ +
                                                                                               i b i
                               ⇥ ⇥1 ⇥1 = i i ⇥i i ⇥    bb B B i
                                                        Bb           xii2bi2xi2
                                                                   bbx
                                                                   i
             N: 256
                                                                                  b b 2b2 B2 i
                                                                                    BB             xi2 b2xi
                                                                                                 b b xi
                                                                                                 i

                                           b      B




    Image f =              x Coe cients x.             Blocks B1                    B1       B2
Block Regularization
     1        2
                  block sparsity: G(x) =             ||x[b] ||,      ||x[b] ||2 =           x2
                                                                                             m
                                               b B                                  m b

iments Towards More Complex Penalization
 Non-overlapping decomposition: B = B ... B
                  Towards More Complex Penalization
                Towards More Complex Penalization
                       n
                                                              1               n

2     G(x) =4 x iBk
      (2)
    + ` ` k=1 G 1,2
                 (x)                 Gi (x) =           ||x[b] ||,
         1    2       i=1                        b Bi
    Each Gi is simple:                                                    b b 1b1 B1 i b xiixb xi
                                                                            BB
                                                                                            22

                   ⇥ x⇥x⇥x⇥⇥1 =i ⇥xG ⇥xi ⇥ m = b B B i b xii2bi2xi2
                        ⇥ ⇥1 = i ⇥i i                                                x +
                                                                                       i b i
     ⇤ m ⇥ b ⇥ Bi , ⇥ ⇥1Prox i ⇥xi ⇥(x) b max i0, 1
                           =                   Bb        bx                         ++m
             N: 256                                                    ||x[b]b||B            xi2 b2xi
                                                                              2 2 B2
                                                                          b B b        i   b b xi
                                                                                           i

                                    b    B




    Image f =              x Coe cients x.       Blocks B1                   B1        B2
Deconv. + Inpaint. 2min+CP Y ⇥ P K x CP Y + P 1 K2
                   Deconv. x 2Inpaint. min 2 ⇥ ` `
                                            x                                                                                                  x
                                                                                                                                             k=1    x+1,2`          k=1
          log10(E−E
          2                                                                                                                                                1   `2
                                        Numerical Illustration



                                                 log10(E−
                             1
                                  Numerical Experiments Experiments
                                    1
                                              Numerical                       1



                                                                                                                                            TI (2)`2 4
                                                                              0

                  ||y x 1 ⇥x||368s PRx 2 minix(x)Y ⇥ K                                                                                   x= + `wavelets x
                                                                                                                                       Bk 2
                             0
            : 283s; tPR: 298s; tCP:: 283s; t : 298s; t (2)
  Deconvolution min 2 Y ⇥ K
         tmin
      −1 EFB
           x 102
                            Deconvolution +GCP: 1` 4
                           −1
                               tEFB 2 +
                                      10 40   20
                                                           368s
                                                      30 1 2 2 40k=1
                                                       `
                                                                                                                                   x   1,2      1    k=1
                       20        30
       3           iteration 3
                             #                   i
                                      EFB iteration #       EFB
             log10(E−Emin)




                                                              log10(E−Emin)
                                                                                          PR                           PR
                             2   = convolution
                                             2   = inpainting+convolution l1/l2
                                                                       l1/l2
                                                                            : 1.30e−03;   CPλ2 : 1.30e−03;             CP 2
                                                                                                                         λ
                      tPR: 173s; tCP 190s noise: 0.025; convol.::it. #50; SNR: 22.49dB #50; SNR: 22.49dB
                                        tEFB: 161s; tPR: 173s; tCP N: 256
          tEFB: 161s; noise: 0.025; :convol.: 2                     190s
                             1
                                  Numerical Experiments            2          1

                                                                              EFB
                                                                                        it.     N: 256
                                                                                                                       EFB
                                                                                                                        (4)                  Bk
                                                                                  Y ⇥P K                           +
                             0                                                0
                                                  log10(E−Emin)

      3                                                                        3
                                                                              1
                                                                              PR
                                                                                                               2       PR          16
 onv. + Inpaint. minx
  2
                                                                              CP
                                                                              2 30
                                                                               2
                                                                                                          x            CP
                                                                                                                        `140`2     k=1   x   1,2
                                      10          20                                      10    40   20        30
      1                                     iteration #
                                                      1                                         iteration #
      0                                                                       0                                         λ4 : 1.00e−03;             λ4 : 1.00e−03;
                                                                                                                         l1/l2                      l1/l2
     tEFB: 283s; tPR: 298s; tCP: 368s
  −1      noise: 0.025; degrad.: 0.4; 0.025; degrad.: 0.4; convol.: 2
                                noise: convol.: 2
                                 −1                         it. #50; SNR: 21.80dB #50; SNR: 21.80dB
                                                                                it.
                             10        20
                                   iteration #
                                                     30
                                                                              EFB
                                                                                    40     10        20
                                                                                                 iteration #
                                                                                                                30            40                         x0
      3
                                                                              PR
min




      2
                                                                              CP
                                                                                                              λ2     : 1.30e−03;                   λ2 : 1.30e−03;
                                                                                                               l1/l2                                l1/l2
      1               noise: 0.025; convol.: 2                                           noise: 0.025; it. #50; SNR: 22.49dB
                                                                                                       convol.: 2                             it. #50; SNR: 22.49dB
10




      0




           log10
                             10        20
                                 (E(x( ) ) #
                                   iteration
                                                    30
                                                  E(x ))
                                                         y = x0 + w                 40
                                                                                                                                   x
                                                                                                               4
Overview
• Optimal Transport and Imaging

• Convex Analysis and Proximal Calculus

• Forward Backward

• Douglas Rachford and ADMM

• Generalized Forward-Backward

• Primal-Dual Schemes
Primal-dual Formulation
Fenchel-Rockafellar duality:       A:H⇥   L     linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui      G⇤ (u)
                                                      2
x2H                            x          u2L
Primal-dual Formulation
Fenchel-Rockafellar duality:       A:H⇥        L       linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui             G⇤ (u)
                                                             2
x2H                            x           u2L

Strong duality:   0 2 ri(dom(G2 ))      A ri(dom(G1 ))
(min $ max)       = max        G⇤ (u) + min G1 (x) + hx, A⇤ ui
                                2
                      u                  x

                  = max        G⇤ (u)
                                2       G⇤ (
                                          1    A⇤ u)
                      u
Primal-dual Formulation
Fenchel-Rockafellar duality:        A:H⇥        L       linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui              G⇤ (u)
                                                              2
x2H                             x           u2L

Strong duality:       0 2 ri(dom(G2 ))   A ri(dom(G1 ))
(min $ max)           = max     G⇤ (u) + min G1 (x) + hx, A⇤ ui
                                 2
                          u               x

                      = max     G⇤ (u)
                                 2       G⇤ (
                                           1    A⇤ u)
                          u
Recovering x? from some u? :
       x? = argmin G1 (x? ) + hx? , A⇤ u? i
                  x
Primal-dual Formulation
Fenchel-Rockafellar duality:            A:H⇥       L       linear
min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui                 G⇤ (u)
                                                                 2
x2H                                 x           u2L

Strong duality:       0 2 ri(dom(G2 ))      A ri(dom(G1 ))
(min $ max)           = max        G⇤ (u) + min G1 (x) + hx, A⇤ ui
                                    2
                          u                  x

                      = max        G⇤ (u)
                                    2       G⇤ (
                                              1    A⇤ u)
                          u
Recovering x? from some u? :
       x? = argmin G1 (x? ) + hx? , A⇤ u? i
                  x

   ()      A⇤ u? 2 @G1 (x? )
   () x? 2 (@G1 )          1
                               ( A⇤ u? ) = @G⇤ ( A⇤ s? )
                                             1
Forward-Backward on the Dual

If G1 is strongly convex:    r2 G1 > cId
                                                 c
  G1 (tx + (1   t)y) 6 tG1 (x) + (1   t)G1 (y)     t(1   t)||x   y||2
                                                 2
Forward-Backward on the Dual

If G1 is strongly convex:     r2 G1 > cId
                                                 c
  G1 (tx + (1   t)y) 6 tG1 (x) + (1   t)G1 (y)     t(1   t)||x   y||2
                                                 2
       x? uniquely defined.
                                          x? = rG? ( A⇤ u? )
                                                 1
       G? is of class C 1 .
        1
Forward-Backward on the Dual

If G1 is strongly convex:         r2 G1 > cId
                                                    c
  G1 (tx + (1   t)y) 6 tG1 (x) + (1      t)G1 (y)     t(1   t)||x   y||2
                                                    2
       x? uniquely defined.
                                             x? = rG? ( A⇤ u? )
                                                    1
       G? is of class C 1 .
        1

 FB on the dual:       min G1 (x) + G2 A(x)
                       x2H
                   =   min G? ( A⇤ u) + G? (u)
                             1            2
                       u2L
                            Smooth      Simple
                              ⇣                             ⌘
       u(`+1) = Prox⌧ G? u(`) + ⌧ A⇤ rG? ( A⇤ u(`) )
                       2               1
Example: TV Denoising
       1
   min ||f      y||2 + ||⇥f ||1              min ||y + div(u)||2
  f RN 2                                   ||u||

             ||u||1 =       ||ui ||        ||u||    = max ||ui ||
                                                        i
                        i
Dual solution u                   Primal solution f = y + div(u )

                                                   [Chambolle 2004]
Example: TV Denoising
       1
   min ||f         y||2 + ||⇥f ||1               min ||y + div(u)||2
  f RN 2                                       ||u||

                ||u||1 =        ||ui ||        ||u||    = max ||ui ||
                                                            i
                            i
Dual solution u                       Primal solution f = y + div(u )

FB (aka projected gradient descent):                   [Chambolle 2004]

   u(   +1)
              = Proj||·||          u( ) +    (y + div(u( ) ))
                                     ui
  v = Proj||·||  (u)      vi =
                               max(||ui ||/ , 1)
                             2      1
  Convergence if     <           =
                       ||div ⇥||    4
Primal-Dual Algorithm
          min G1 (x) + G2 A(x)
          x H
() min max G1 (x)    G⇤ (z) + hA(x), zi
                      2
      x    z
Primal-Dual Algorithm
          min G1 (x) + G2 A(x)
          x H
() min max G1 (x)            G⇤ (z) + hA(x), zi
                              2
      x        z

    z (`+1) = Prox      G⇤
                         2
                             (z (`) + A(˜(`) )
                                        x
    x(⇥+1) = Prox       G1 (x(⇥)          A (z (⇥) ))
    ˜
    x(   +1)
               = x(   +1)
                            + (x(   +1)
                                             x( ) )
    = 0: Arrow-Hurwicz algorithm.
    = 1: convergence speed on duality gap.
Primal-Dual Algorithm
             min G1 (x) + G2 A(x)
             x H
() min max G1 (x)             G⇤ (z) + hA(x), zi
                               2
         x     z

    z (`+1) = Prox       G⇤
                          2
                              (z (`) + A(˜(`) )
                                         x
    x(⇥+1) = Prox        G1 (x(⇥)         A (z (⇥) ))
    ˜
    x(   +1)
               = x(   +1)
                            + (x(   +1)
                                             x( ) )
     = 0: Arrow-Hurwicz algorithm.
     = 1: convergence speed on duality gap.

  Theorem: [Chambolle-Pock 2011]
     If 0             1 and ⇥⇤ ||A||2 < 1 then
    x(   )
               x minimizer of G1 + G2 A.
Example: Optimal Transport

Staggered grid formulation :
         min
         1     2
                   J(I(x)) + ◆C (x)
   x2RGst ⇥RGst
                           1          2
                          Gst        Gst
               1    2
     I = (I , I ) : R           ⇥R         ! RG c

     s
                                              s

                                      I

                          t                                      t
         Staggered grid                           Centered grid Gc
              1    2
             Gst Gst
Conclusion
Inverse problems in imaging:
      Large scale, N 106 .
     Non-smooth (sparsity, TV, . . . )
     (Sometimes) convex.
     Highly structured (separability,    p
                                             norms, . . . ).
Conclusion
Inverse problems in imaging:
      Large scale, N 106 .
                 Towards More Complex Penalization
     Non-smooth (sparsity, TV, . . . )
     (Sometimes) convex.                                  b B1       i b xi
                                                                           2

                ⇥ x⇥⇥1 =   i ⇥xi ⇥    b B
                                                 2
                                            i p xi               +
     Highly structured (separability,          b
                                                norms, . . . ).
                                                          b B2       i   b xi2
Proximal splitting:
     Unravel the structure of problems.
     Parallelizable.


                               Decomposition G =     k   Gk
Conclusion
Inverse problems in imaging:
      Large scale, N 106 .
                 Towards More Complex Penalization
     Non-smooth (sparsity, TV, . . . )
     (Sometimes) convex.                               b B1       i b xi
                                                                        2

                ⇥ x⇥⇥1 =   i ⇥xi ⇥   b B
                                                2
                                           i p xi             +
     Highly structured (separability,         b
                                               norms, . . . ).
                                                       b B2       i   b xi2
Proximal splitting:
     Unravel the structure of problems.
     Parallelizable.
Open problems:
                        Decomposition G = k Gk
     Less structured problems without smoothness.
     Non-convex optimization.

More Related Content

PDF
Low Complexity Regularization of Inverse Problems
PDF
Mesh Processing Course : Active Contours
PDF
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
PDF
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
PDF
Signal Processing Course : Convex Optimization
PDF
Signal Processing Course : Inverse Problems Regularization
PDF
Model Selection with Piecewise Regular Gauges
PDF
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...
Low Complexity Regularization of Inverse Problems
Mesh Processing Course : Active Contours
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Low Complexity Regularization of Inverse Problems - Course #1 Inverse Problems
Signal Processing Course : Convex Optimization
Signal Processing Course : Inverse Problems Regularization
Model Selection with Piecewise Regular Gauges
Low Complexity Regularization of Inverse Problems - Course #2 Recovery Guaran...

What's hot (20)

PDF
Learning Sparse Representation
PDF
Geodesic Method in Computer Vision and Graphics
PDF
Mesh Processing Course : Multiresolution
PDF
Mesh Processing Course : Mesh Parameterization
PDF
A series of maximum entropy upper bounds of the differential entropy
PDF
Bregman divergences from comparative convexity
PDF
Classification with mixtures of curved Mahalanobis metrics
PDF
Signal Processing Course : Sparse Regularization of Inverse Problems
PDF
Lecture 2: linear SVM in the dual
PDF
The dual geometry of Shannon information
PDF
Adaptive Signal and Image Processing
PDF
Mesh Processing Course : Geodesic Sampling
PDF
Lecture5 kernel svm
PDF
Andreas Eberle
PDF
Lecture3 linear svm_with_slack
PDF
Hyperfunction method for numerical integration and Fredholm integral equation...
PDF
Mesh Processing Course : Geodesics
PDF
L. Jonke - A Twisted Look on Kappa-Minkowski: U(1) Gauge Theory
PDF
Image Processing 3
PDF
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Learning Sparse Representation
Geodesic Method in Computer Vision and Graphics
Mesh Processing Course : Multiresolution
Mesh Processing Course : Mesh Parameterization
A series of maximum entropy upper bounds of the differential entropy
Bregman divergences from comparative convexity
Classification with mixtures of curved Mahalanobis metrics
Signal Processing Course : Sparse Regularization of Inverse Problems
Lecture 2: linear SVM in the dual
The dual geometry of Shannon information
Adaptive Signal and Image Processing
Mesh Processing Course : Geodesic Sampling
Lecture5 kernel svm
Andreas Eberle
Lecture3 linear svm_with_slack
Hyperfunction method for numerical integration and Fredholm integral equation...
Mesh Processing Course : Geodesics
L. Jonke - A Twisted Look on Kappa-Minkowski: U(1) Gauge Theory
Image Processing 3
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Ad

Similar to Proximal Splitting and Optimal Transport (20)

DOCX
1 d wave equation
DOC
Functions
PDF
Course on Optimal Transport
PDF
Optimal Transport for a Computer Programmer's Point of View
PDF
Signal Processing Course : Fourier
PDF
Cluster Analysis
PDF
Lesson 2: A Catalog of Essential Functions
PDF
Lesson 2: A Catalog of Essential Functions
PDF
Stationary Points Handout
PDF
Quadratic form and functional optimization
PDF
Reconstruction (of micro-objects) based on focus-sets using blind deconvoluti...
PDF
Introduction to the theory of optimization
PDF
Calculus of variations & solution manual russak
PDF
02 2d systems matrix
PDF
Signal Processing Course : Wavelets
DOC
Concept map function
PDF
PDF
Lesson 28: Lagrange Multipliers II
PDF
Lesson 28: Lagrange Multipliers II
PDF
Lesson 2: A Catalog of Essential Functions
1 d wave equation
Functions
Course on Optimal Transport
Optimal Transport for a Computer Programmer's Point of View
Signal Processing Course : Fourier
Cluster Analysis
Lesson 2: A Catalog of Essential Functions
Lesson 2: A Catalog of Essential Functions
Stationary Points Handout
Quadratic form and functional optimization
Reconstruction (of micro-objects) based on focus-sets using blind deconvoluti...
Introduction to the theory of optimization
Calculus of variations & solution manual russak
02 2d systems matrix
Signal Processing Course : Wavelets
Concept map function
Lesson 28: Lagrange Multipliers II
Lesson 28: Lagrange Multipliers II
Lesson 2: A Catalog of Essential Functions
Ad

More from Gabriel Peyré (12)

PDF
Mesh Processing Course : Introduction
PDF
Mesh Processing Course : Differential Calculus
PDF
Signal Processing Course : Theory for Sparse Recovery
PDF
Signal Processing Course : Presentation of the Course
PDF
Signal Processing Course : Orthogonal Bases
PDF
Signal Processing Course : Denoising
PDF
Signal Processing Course : Compressed Sensing
PDF
Signal Processing Course : Approximation
PDF
Sparsity and Compressed Sensing
PDF
Optimal Transport in Imaging Sciences
PDF
An Introduction to Optimal Transport
PDF
A Review of Proximal Methods, with a New One
Mesh Processing Course : Introduction
Mesh Processing Course : Differential Calculus
Signal Processing Course : Theory for Sparse Recovery
Signal Processing Course : Presentation of the Course
Signal Processing Course : Orthogonal Bases
Signal Processing Course : Denoising
Signal Processing Course : Compressed Sensing
Signal Processing Course : Approximation
Sparsity and Compressed Sensing
Optimal Transport in Imaging Sciences
An Introduction to Optimal Transport
A Review of Proximal Methods, with a New One

Recently uploaded (20)

PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Cell Structure & Organelles in detailed.
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Pre independence Education in Inndia.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
STATICS OF THE RIGID BODIES Hibbelers.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPH.pptx obstetrics and gynecology in nursing
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
O7-L3 Supply Chain Operations - ICLT Program
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
VCE English Exam - Section C Student Revision Booklet
Microbial disease of the cardiovascular and lymphatic systems
Anesthesia in Laparoscopic Surgery in India
Cell Structure & Organelles in detailed.
O5-L3 Freight Transport Ops (International) V1.pdf
Classroom Observation Tools for Teachers
Final Presentation General Medicine 03-08-2024.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pre independence Education in Inndia.pdf

Proximal Splitting and Optimal Transport

  • 1. Proximal Splitting and Optimal Transport Gabriel Peyré www.numerical-tours.com
  • 2. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 3. ork, Measure Preserving Maps ica- d ofDistributions µ0 , µ1 on Rk . ase. eeds ans- that eme rate ance eval t al. µ0 µ1
  • 4. ork, Measure Preserving Maps ica- d ofDistributions µ0 , µ1 on Rk . ase. eeds Mass preserving map T : Rk Rk . ans- that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A)) 1 eme rate ance x T (x) eval t al. µ0 µ1
  • 5. ork, Measure Preserving Maps ica- d ofDistributions µ0 , µ1 on Rk . ase. eeds Mass preserving map T : Rk Rk . ans- that µ1 = T µ0 where (T µ0 )(A) = µ0 (T (A)) 1 eme rate ance x T (x) eval t al. µ0 µ1 Distributions with densities: µi = i (x)dx T µ0 = µ1 1 (T (x))|det ⇥T (x)| = 0 (x)
  • 6. Optimal Transport Lp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1
  • 7. Optimal Transport Lp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1 Regularity condition: µ0 or µ1 does not give mass to “small sets”. Theorem (p > 1): there exists a unique optimal T . T T µ1 µ0
  • 8. Optimal Transport Lp optimal transport: W2 (µ0 , µ1 )p = min ||T (x) x||p µ0 (dx) T µ0 =µ1 Regularity condition: µ0 or µ1 does not give mass to “small sets”. Theorem (p > 1): there exists a unique optimal T . Theorem (p = 2): T is defined as T = with convex. T T T (x) T (x ) T is monotone: µ1 x T (x) T (x ), x x 0 µ0 x
  • 9. Wasserstein Distance µ Couplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B)
  • 10. Wasserstein Distance µ Couplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B) Transportation cost: Wp (µ, )p = min c(x, y)d⇥(x, y) µ, Rd Rd
  • 11. Wasserstein Distance µ Couplings: µ, x A Rd , ⇥(A Rd ) = µ(A) y B Rd , ⇥(Rd B) = (B) Transportation cost: Wp (µ, )p = min c(x, y)d⇥(x, y) µ, Rd Rd
  • 12. Optimal Transport Let p > 1 and µ does not vanish on small sets. Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y) Rd Rd Optimal transport T : Rd Rd : µ x y (x, T (x))
  • 13. Optimal Transport Let p > 1 and µ does not vanish on small sets. Unique µ, s.t. Wp (µ, )p = c(x, y)d⇥(x, y) Rd Rd Optimal transport T : Rd Rd : µ x p = 2: T = unique solution of y ⇥ is convex l.s.c. (x, T (x)) ( ⇥)⇤µ =
  • 14. 1-D Continuous Wasserstein Distributions µ, on R. t Cumulative functions: Cµ (t) = dµ(x) For all p > 1: T =C 1 Cµ T is non-decreasing (“change of contrast”)
  • 15. 1-D Continuous Wasserstein Distributions µ, on R. t Cumulative functions: Cµ (t) = dµ(x) For all p > 1: T =C 1 Cµ T is non-decreasing (“change of contrast”) Explicit formulas: 1 H Wp (µ, )p = |Cµ 1 C 1 p | 0 W1 (µ, ) = |Cµ C | = ||(Cµ C ) ⇥ H||1 R
  • 16. Grayscale Histogram Transfer f1 Input images: fi : [0, 1] 2 [0, 1], i = 0, 1. f0
  • 17. Grayscale Histogram Transfer f1 Input images: fi : [0, 1] 2 [0, 1], i = 0, 1. Gray-value distributions: µi defined on [0, 1]. µi ([a, b]) = 1{a f b} (x)dx [0,1]2 µ1 f0 µ0
  • 18. Grayscale Histogram Transfer f1 Input images: fi : [0, 1] 2 [0, 1], i = 0, 1. Gray-value distributions: µi defined on [0, 1]. µi ([a, b]) = 1{a f b} (x)dx [0,1]2 Optimal transport: T = Cµ11 Cµ0 . µ1 f0 Cµ0 (f0 ) T (f0 ) Cµ0 Cµ11 µ0 µ1
  • 19. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein project image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 J. Rabin Wasserstein Regularization
  • 20. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein project image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 J. Rabin Wasserstein Regularization
  • 21. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Transport: T : f0 (x) R3 f1 ( (i)) R3 Optimal transport framework Sliced Wasserstein projection Applications Application to Color Transfer Source image (X ) f1 f0 Sliced Wasserstein project image color statistics Y f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 T J. Rabin Wasserstein Regularization
  • 22. pplication to Color Transfer Color Histogram Equalization 1 Input color images: fi RN 3 . projection iof= to style Sliced Wasserstein ⇥ X N x fi (x) image color statistics Y Optimal assignement: min ||f0 f1 ⇥ || N Optimal transport framework Sliced Wasserstein projection Applications Transport: T : f0 (x) R3Application to Color Transfer R3 f1 ( (i)) Optimal transport framework Sliced Wasserstein projection Applications ˜ Application to ColorfTransfer Equalization:) f0 = T (f0 ) ˜ = f1 0 Sliced Wasserstein projection of X to sty Source image (X image color statistics Y f1 f0 T (f0 ) Sliced Wasserstein project image color statistics Y Source image (X ) T f0 Source image after color transfer µ1 image (Y ) Style Source image (X ) µ0 Source image after color transfer µ1 Style image (Y ) T J. Rabin Wasserstein Regularization J. Rabin Wasserstein Regularization
  • 23. cðdvÞ ¼ l0> þ dvÞ detðrðv þ dvÞÞ À l1 ¼ 0: ðv can be thought as an elliptic system thought as anThe sys-system of equations. The trilinearRelaxation was performed for transferring v cc > tem cv cc can be of equations. elliptic a sys- the GPU. We used cubic grid. interpolation used a trilineara parallelizable four- the GPU. We operator using interpolation operator for transferring Image Registration Ittem isto verify that a correction for dv can be obtained by solving with an is easy solved using preconditioned conjugate gradient color Gauss-Seidel relaxation scheme. Thisrestriction s solved using preconditioned conjugate À1 gradient with an the coarse grid residual increases robustness the coarse grid correction to fine grids. Thecorrection to fine grids. The residual restriction the system dv % c> ðcv c> Þ cðvÞ (Nocedal and Wright, 1999) The sys- and efficiency and is especially suited for the implementation on incomplete Cholesky preconditioner. mplete Cholesky preconditioner. v v operator for projecting residual from for projecting residual from the fine to coarse grids is operator the fine to coarse grids is tem c c> can be thought as an elliptic system of equations. The sys- v c the GPU. We used a trilinear interpolation operator for transferring tem is solved using preconditioned conjugate gradient with an the coarse grid correction to fine grids. The residual restriction incomplete Cholesky preconditioner. operator for projecting residual from the fine to coarse grids is T [ur Rehman et al, 2009] Fig. 6. OMT Results viewed on an axial slice. The top row shows corresponding slices from Pre-op(Left) and Post-op(Right) MRI data. The deformation is clearly visible in the anterior part of the brain.
  • 24. Convex Formulation (Benamou-Brenier) ⇢ ⇢ : Rd ⇥ [0, 1] ! R+ solving: Find m : Rd ⇥ [0, 1] ! Rd W (µ0 , µ1 )2 = min J(x) + ◆C (x) x=(m,⇢)
  • 25. Convex Formulation (Benamou-Brenier) ⇢ ⇢ : Rd ⇥ [0, 1] ! R+ solving: Find m : Rd ⇥ [0, 1] ! Rd W (µ0 , µ1 )2 = min J(x) + ◆C (x) x=(m,⇢) Z Z 1 J(x) = j(x(s, t))dtds s2Rd t=0 8 ||m||2 < ˜˜ ⇢ if ⇢ > 0, ˜ j(m, ⇢) = ˜ ˜ : 0 if ⇢ = 0 and m = 0, ˜ ˜ +1 otherwise. 2 R 2 R2
  • 26. Convex Formulation (Benamou-Brenier) ⇢ ⇢ : Rd ⇥ [0, 1] ! R+ solving: Find m : Rd ⇥ [0, 1] ! Rd W (µ0 , µ1 )2 = min J(x) + ◆C (x) x=(m,⇢) Z Z 1 J(x) = j(x(s, t))dtds s2Rd t=0 8 ||m||2 < ˜˜ ⇢ if ⇢ > 0, ˜ j(m, ⇢) = ˜ ˜ : 0 if ⇢ = 0 and m = 0, ˜ ˜ +1 otherwise. 2 R 2 R2 C = {x = (m, ⇢) div(x) = 0, B(⇢) = (⇢0 , ⇢1 )} B(⇢) = (⇢(0, ·), ⇢(1, ·))
  • 28. Numerical Examples ⇢0 ⇢1 con- work, plica- ad of ease. peeds rans- t that heme erate mance ieval et al. t Figure 7: Synthetic 2D examples on a Euclidean domain. The
  • 29. Discrete Formulation s Centered grid formulation (d = 1): min J(x) + ◆C (x) x2RGc ⇥2 P J(x) = i2Gc j(xi ) t Centered grid Gc
  • 30. Discrete Formulation s Centered grid formulation (d = 1): min J(x) + ◆C (x) x2RGc ⇥2 P J(x) = i2Gc j(xi ) t Staggered grid formulation : Centered grid Gc min 2 J(I(x)) + ◆C (x) s 1 x2RGst ⇥RGst t Staggered grid 1 2 Gst Gst
  • 31. Discrete Formulation s Centered grid formulation (d = 1): min J(x) + ◆C (x) x2RGc ⇥2 P J(x) = i2Gc j(xi ) t Staggered grid formulation : Centered grid Gc min 2 J(I(x)) + ◆C (x) s 1 x2RGst ⇥RGst Interpolation operator: 1 2 Gst Gst 1 2 I = (I , I ) : R ⇥R ! RG c t 2I1 (m)i,j = mi+ 1 ,j + mi 2 1 2 ,j Staggered grid ! Projection on div(x) = 0 using FFTs. 1 2 Gst Gst
  • 32. SOCP Formulation P min J(x) + ◆C (x) J(x) = i2Gc j(xi ) x2RGc ⇥d X () min ri s.t. 8 i 2 Gc , (mi , ⇢i , ri ) 2 K x2RGc ⇥d ,r2RGc i (Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2 ||m||2 6 ⇢r ˜ ˜ ˜ ˜ ˜˜
  • 33. SOCP Formulation P min J(x) + ◆C (x) J(x) = i2Gc j(xi ) x2RGc ⇥d X () min ri s.t. 8 i 2 Gc , (mi , ⇢i , ri ) 2 K x2RGc ⇥d ,r2RGc i (Rotated) Lorentz cone: K = (m, ⇢, r) 2 Rd+2 ||m||2 6 ⇢r ˜ ˜ ˜ ˜ ˜˜ Second order cone program: ! Use interior point methods (e.g. MOSEK software). Linear convergence with iteration #. Poor scaling with dimension |Gc |. E cient for medium scale problems (N ⇠ 104 ).
  • 34. 1 Example: Regularization Inverse problem: measurements y = x0 + w x0 y
  • 35. 1 Example: Regularization Inverse problem: measurements y = x0 + w x0 y x? argmin Regularized inversion: x? 2 argmin 1 ||y 2 x||2 + R(x) x2R N Data fidelity Regularity
  • 36. 1 Example: Regularization Inverse problem: measurements y = x0 + w x0 y x? argmin Regularized inversion: x? 2 argmin 1 ||y 2 x||2 + R(x) x2R N Data fidelity Regularity P Total Variation: R(x) = i ||(rx)i ||
  • 37. 1 Example: Regularization Inverse problem: measurements y = x0 + w x0 y x? argmin Regularized inversion: x? 2 argmin 1 ||y 2 x||2 + R(x) x2R N Data fidelity Regularity P Total Variation: R(x) = i ||(rx)i || 1 P ⇤ ` sparsity: R(x) = i |xi | Images are sparse in wavelet bases. ⇤ Image f = x Coe↵. x = f
  • 38. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 39. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H
  • 40. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H Class of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1]
  • 41. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H Class of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1] Lower semi-continuous: lim inf G(x) G(x0 ) x x0 Proper: {x ⇥ H G(x) ⇤= + } = ⌅ ⇤
  • 42. Convex Optimization Setting: G : H R ⇤ {+⇥} H: Hilbert space. Here: H = RN . Problem: min G(x) x H Class of functions: x y Convex: G(tx + (1 t)y) tG(x) + (1 t)G(y) t [0, 1] Lower semi-continuous: lim inf G(x) G(x0 ) x x0 Proper: {x ⇥ H G(x) ⇤= + } = ⌅ ⇤ 0 if x ⇥ C, Indicator: C (x) = + otherwise. (C closed and convex)
  • 43. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| G(0) = [ 1, 1]
  • 44. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| Smooth functions: If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1]
  • 45. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| Smooth functions: If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1] First-order conditions: x argmin G(x) 0 G(x ) x H
  • 46. Sub-differential Sub-di erential: G(x) = {u ⇥ H ⇤ z, G(z) G(x) + ⌅u, z x⇧} G(x) = |x| Smooth functions: If F is C 1 , F (x) = { F (x)} G(0) = [ 1, 1] First-order conditions: x argmin G(x) 0 G(x ) x H U (x) x Monotone operator: U (x) = G(x) (u, v) U (x) U (y), y x, v u 0
  • 47. Prox and Subdifferential 1 Prox G (x) = argmin ||x z||2 + G(z) z 2
  • 48. Prox and Subdifferential 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 Resolvant of G: z = Prox G (x) 0 z x + ⇥G(z) x (Id + ⇥G)(z)
  • 49. Prox and Subdifferential 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 Resolvant of G: z = Prox G (x) 0 z x + ⇥G(z) x (Id + ⇥G)(z) z = (Id + ⇥G) 1 (x) Inverse of a set-valued mapping: where x U (y) y U 1 (x) Prox G = (Id + ⇥G) 1 is a single-valued mapping
  • 50. Prox and Subdifferential 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 Resolvant of G: z = Prox G (x) 0 z x + ⇥G(z) x (Id + ⇥G)(z) z = (Id + ⇥G) 1 (x) Inverse of a set-valued mapping: where x U (y) y U 1 (x) Prox G = (Id + ⇥G) 1 is a single-valued mapping Fix point: x argmin G(x) x 0 G(x ) x (Id + ⇥G)(x ) x⇥ = (Id + ⇥G) 1 (x⇥ ) = Prox G (x⇥ )
  • 51. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn ))
  • 52. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1
  • 53. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1 Composition by tight frame: A A = Id ProxG A (x) =A ProxG A + Id A A
  • 54. Proximal Calculus Separability: G(x) = G1 (x1 ) + . . . + Gn (xn ) ProxG (x) = (ProxG1 (x1 ), . . . , ProxGn (xn )) 1 Quadratic functionals: G(x) = || x y||2 2 Prox G = (Id + ) 1 = (Id + ) 1 Composition by tight frame: A A = Id ProxG A (x) =A ProxG A + Id A A x Indicators: G(x) = C (x) C Prox G (x) = ProjC (x) ProjC (x) = argmin ||x z|| z C
  • 55. Prox of Sparse Regularizers 1 Prox G (x) = argmin ||x z||2 + G(z) z 2
  • 56. Prox of Sparse Regularizers 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 G(x) = ||x||1 = |xi | 12 log(1 + x2 ) i 10 |x| ||x||0 8 6 4 2 G(x) = ||x||0 = | {i xi = 0} | 0 −2 G(x) −10 −8 −6 −4 −2 0 2 4 6 8 10 G(x) = log(1 + |xi |2 ) i
  • 57. Prox of Sparse Regularizers 1 Prox G (x) = argmin ||x z||2 + G(z) z 2 G(x) = ||x||1 = |xi | 12 log(1 + x2 ) i 10 |x| ||x||0 Prox G (x)i = max 0, 1 xi 8 |xi | 6 4 2 G(x) = ||x||0 = | {i xi = 0} | 0 −2 G(x) xi if |xi | 2 , −10 −8 −6 −4 −2 0 2 4 6 8 10 Prox G (x)i = 10 0 otherwise. 8 6 4 2 G(x) = log(1 + |xi |2 ) −2 0 i −4 3rd order polynomial root. −6 −8 ProxG (x) −10 −10 −8 −6 −4 −2 0 2 4 6 8 10
  • 58. Legendre-Fenchel Duality Legendre-Fenchel transform: G (u) = sup u, x G(x) eu x dom(G) G(x) S lop G (u) x
  • 59. Legendre-Fenchel Duality Legendre-Fenchel transform: G (u) = sup u, x G(x) eu x dom(G) G(x) S lop G (u) Example: quadratic functional 1 x G(x) = Ax, x + x, b 2 1 G (u) = u b, A 1 (u b) 2
  • 60. Legendre-Fenchel Duality Legendre-Fenchel transform: G (u) = sup u, x G(x) eu x dom(G) G(x) S lop G (u) Example: quadratic functional 1 x G(x) = Ax, x + x, b 2 1 G (u) = u b, A 1 (u b) 2 Moreau’s identity: Prox G (x) = x ProxG/ (x/ ) G simple G simple
  • 61. Indicator and Homogeneous Functionals Positively 1-homogeneous functional: G( x) = | |G(x) Example: norm G(x) = ||x|| Duality: G (x) = G (·) 1 (x) G (y) = min x, y G(x) 1
  • 62. Indicator and Homogeneous Functionals Positively 1-homogeneous functional: G( x) = | |G(x) Example: norm G(x) = ||x|| Duality: G (x) = G (·) 1 (x) G (y) = min x, y G(x) 1 p norms: G(x) = ||x||p 1 1 + =1 1 p, q + G (x) = ||x||q p q
  • 63. Indicator and Homogeneous Functionals Positively 1-homogeneous functional: G( x) = | |G(x) Example: norm G(x) = ||x|| Duality: G (x) = G (·) 1 (x) G (y) = min x, y G(x) 1 p norms: G(x) = ||x||p 1 1 + =1 1 p, q + G (x) = ||x||q p q Example: Proximal operator of norm Prox ||·|| = Id Proj||·||1 Proj||·||1 (x)i = max 0, 1 xi |xi | for a well-chosen ⇥ = ⇥ (x, )
  • 64. Prox of the J Functional X ||m||2 ˜ J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜
  • 65. Prox of the J Functional X ||m||2 ˜ J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜ Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i
  • 66. Prox of the J Functional X ||m||2 ˜ J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜ Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i j ⇤ = ◆C where C = (a, b) 2 R2 ⇥ R 2||a||2 + b 6 0 Prox j (˜) = x x ˜ ProjC (˜/ ) x where x = (m, ⇢) ˜ ˜ ˜
  • 67. Prox of the J Functional X ||m||2 ˜ J(m, ⇢) = j(mi , ⇢i ) j(m, ⇢) = ˜ ˜ for ⇢ > 0 ˜ i ⇢˜ Prox J (m, ⇢) = (Prox j (mi , ⇢i ))i j ⇤ = ◆C where C = (a, b) 2 R2 ⇥ R 2||a||2 + b 6 0 Prox j (˜) = x x ˜ ProjC (˜/ ) x where x = (m, ⇢) ˜ ˜ ˜ ⇢ (m? , ⇢? ) if ⇢? > 0 Proposition: Prox (m, ⇢) = ˜ ˜ (0, 0) otherwise. ⇢? m ˜ ? where m = ? and ⇢? is the largest root of ⇢ +2 X 3 + (4 ⇢)X 2 + 4 ( ˜ ⇢)X ˜ ||m||2 ˜ 4 2 ⇢=0 ˜
  • 68. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 69. Gradient and Proximal Descents Gradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution.
  • 70. Gradient and Proximal Descents Gradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution. Sub-gradient descent: x( +1) = x( ) v( ) , v( ) G(x( ) ) Theorem: If 1/⇥, x( ) x a solution. Problem: slow.
  • 71. Gradient and Proximal Descents Gradient descent: x( +1) = x( ) G(x( ) ) [explicit] G is C 1 and G is L-Lipschitz Theorem: If 0 < < 2/L, x( ) x a solution. Sub-gradient descent: x( +1) = x( ) v( ) , v( ) G(x( ) ) Theorem: If 1/⇥, x( ) x a solution. Problem: slow. Proximal-point algorithm: x(⇥+1) = Prox G (x(⇥) ) [implicit] Theorem: If c > 0, x( ) x a solution. Prox G hard to compute. [Rockafellar, 70]
  • 72. Proximal Splitting Methods Solve min E(x) x H Problem: Prox E is not available.
  • 73. Proximal Splitting Methods Solve min E(x) x H Problem: Prox E is not available. Splitting: E(x) = F (x) + Gi (x) i Smooth Simple
  • 74. Proximal Splitting Methods Solve min E(x) x H Problem: Prox E is not available. Splitting: E(x) = F (x) + Gi (x) i Smooth Simple F (x) Iterative algorithms using: Prox Gi (x) solves Forward-Backward: F + G Douglas-Rachford: Gi Primal-Dual: Gi A Generalized FB: F+ Gi
  • 75. Smooth + Simple Splitting Inverse problem: measurements y = Kf0 + w f0 Kf0 K K : RN RP , P N Model: f0 = x0 sparse in dictionary . Sparse recovery: f = x where x solves min F (x) + G(x) x RN Smooth Simple 1 Data fidelity: F (x) = ||y x||2 =K ⇥ 2 Regularization: G(x) = ||x||1 = |xi | i
  • 76. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ ))
  • 77. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ )) Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) )
  • 78. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ )) Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) ) Projected gradient descent: G= C
  • 79. Forward-Backward Fix point equation: x argmin F (x) + G(x) 0 F (x ) + G(x ) x (x F (x )) x + ⇥G(x ) x⇥ = Prox G (x⇥ F (x⇥ )) Forward-backward: x(⇥+1) = Prox G x(⇥) F (x(⇥) ) Projected gradient descent: G= C Theorem: Let F be L-Lipschitz. If < 2/L, x( ) x a solution of ( ) [Passty 79, Gabay, 83]
  • 80. Example: L1 Regularization 1 min || x y||2 + ||x||1 min F (x) + G(x) x 2 x 1 F (x) = || x y||2 2 F (x) = ( x y) L = || || G(x) = ||x||1 ⇥ Prox G (x)i = max 0, 1 xi |xi | Forward-backward Iterative soft thresholding
  • 81. Convergence Speed min E(x) = F (x) + G(x) x F is L-Lipschitz. G is simple. Theorem: If L > 0, FB iterates x( ) satisfies E(x( ) ) E(x ) C/ C degrades with L 0.
  • 82. Multi-steps Accelerations Beck-Teboule accelerated FB: t(0) = 1 ✓ ◆ (`+1) (`) 1 x = Prox1/L y rF (y (`) ) L 1+ 1 + 4(t( ) )2 t( +1) = 2() t 1 ( y ( +1) =x( +1) + ( +1) (x +1) x( ) ) t (see also Nesterov method) C Theorem: If L > 0, E(x ( ) ) E(x ) Complexity theory: optimal in a worse-case sense.
  • 83. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 84. Douglas Rachford Scheme min G1 (x) + G2 (x) ( ) x Simple Simple Douglas-Rachford iterations: z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) ) 2 2 x(⇥+1) = Prox G2 (z (⇥+1) ) Reflexive prox: RProx G (x) = 2Prox G (x) x
  • 85. Douglas Rachford Scheme min G1 (x) + G2 (x) ( ) x Simple Simple Douglas-Rachford iterations: z (⇥+1) = 1 z (⇥) + RProx G2 RProx G1 (z (⇥) ) 2 2 x(⇥+1) = Prox G2 (z (⇥+1) ) Reflexive prox: RProx G (x) = 2Prox G (x) x Theorem: If 0 < < 2 and ⇥ > 0, x( ) x a solution of ( ) [Lions, Mercier, 79]
  • 86. DR Fix Point Equation min G1 (x) + G2 (x) 0 (G1 + G2 )(x) x z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x) x = Prox G1 (z) and (2x z) x ⇥( G2 )(x)
  • 87. DR Fix Point Equation min G1 (x) + G2 (x) 0 (G1 + G2 )(x) x z, z x ⇥( G1 )(x) and x z ⇥( G2 )(x) x = Prox G1 (z) and (2x z) x ⇥( G2 )(x) x = Prox G2 (2x z) = Prox G2 RProx G1 (z) z = 2Prox G2 RProx G1 (y) (2x z) z = 2Prox G2 RProx G1 (z) RProx G1 (z) z = RProx G2 RProx G1 (z) z= 1 z+ RProx G2 RProx G1 (z) 2 2
  • 88. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc
  • 89. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc Prox J : cubic root (closed form).
  • 90. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc Prox J : cubic root (closed form). Prox◆C = ProjC = (Id A⇤ 1 A) + A⇤ 1 y 1 = (AA⇤ ) 1 : solving a Poisson equation with b.c.
  • 91. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc Prox J : cubic root (closed form). Prox◆C = ProjC = (Id A⇤ 1 A) + A⇤ 1 y 1 = (AA⇤ ) 1 : solving a Poisson equation with b.c. Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000]
  • 92. Example: Optimal Transport on Centered Grid s min J(x) + ◆C (x) x2RGc ⇥2 C = {x = (m, ⇢) Ax = b} I0 I1 b = (0, ⇢0 , ⇢1 ) t A(x) = (div(x), ⇢I0 , ⇢I1 ) Centered grid Gc Prox J : cubic root (closed form). Prox◆C = ProjC = (Id A⇤ 1 A) + A⇤ 1 y 1 = (AA⇤ ) 1 : solving a Poisson equation with b.c. Proposition: DR(↵ = 1) is ALG2 of [Benamou, Brenier 2000] ! Advantage: relaxation parameter ↵ 2]0, 1[.
  • 93. Example: Constrained L1 min ||x||1 min G1 (x) + G2 (x) x=y x G1 (x) = iC (x), C = {x x = y} Prox G1 (x) = ProjC (x) = x + ⇥ ( ⇥ ) 1 (y x) G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi |xi | i e⇥cient if easy to invert.
  • 94. Example: Constrained L1 min ||x||1 min G1 (x) + G2 (x) x=y x G1 (x) = iC (x), C = {x x = y} Prox G1 (x) = ProjC (x) = x + ⇥ ( ⇥ ) 1 (y x) G2 (x) = ||x||1 Prox G2 (x) = max 0, 1 xi |xi | i e⇥cient if easy to invert. log10 (||x( ) ||1 ||x ||1 ) 1 Example: compressed sensing −1 0 R100 400 Gaussian matrix −2 −3 = 0.01 y = x0 ||x0 ||0 = 17 −4 =1 −5 = 10 50 100 150 200 250
  • 95. Auxiliary Variables with DR min G1 (x) + G2 A(x) Linear map A : E H. x min G(z) + C (z) G1 , G2 simple. z⇥H E G(x, y) = G1 (x) + G2 (y) C = {(x, y) ⇥ H E Ax = y}
  • 96. Auxiliary Variables with DR min G1 (x) + G2 A(x) Linear map A : E H. x min G(z) + C (z) G1 , G2 simple. z⇥H E G(x, y) = G1 (x) + G2 (y) C = {(x, y) ⇥ H E Ax = y} Prox G (x, y) = (Prox G1 (x), Prox G2 (y)) Prox C (x, y) = (x + A y , y ˜ y ) = (˜, A˜) ˜ x x y = (Id + AA ) ˜ 1 (Ax y) where x = (Id + A A) ˜ 1 (A y + x) e cient if Id + AA or Id + A A easy to invert.
  • 97. Example: TV Regularization 1 ||u||1 = ||ui || min ||Kf y||2 + ||⇥f ||1 f 2 i min G1 (f ) + G2 (f ) x G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui ||ui || 1 G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1 K 2 G2 C = (f, u) ⇥ RN RN 2 u = ⇤f ˜ ˜ Prox C (f, u) = (f , f )
  • 98. Example: TV Regularization 1 ||u||1 = ||ui || min ||Kf y||2 + ||⇥f ||1 f 2 i min G1 (f ) + G2 (f ) x G1 (u) = ||u||1 Prox G1 (u)i = max 0, 1 ui ||ui || 1 G2 (f ) = ||Kf y||2 Prox = (Id + K K) 1 K 2 G2 C = (f, u) ⇥ RN RN 2 u = ⇤f ˜ ˜ Prox C (f, u) = (f , f ) Compute the solution of: (Id + ˜ )f = div(u) + f O(N log(N )) operations using FFT.
  • 99. Example: TV Regularization Orignal f0 y = f0 + w Recovery f y = Kx0 Iteration
  • 100. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective.
  • 101. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective. Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y u
  • 102. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective. Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y u Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2 x,y u 2
  • 103. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective. Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y u Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2 x,y u 2 (`+1) x = argminx L (x, y (`) , u(`) ) ADMM y (`+1) = argminy L (x(`+1) , y, u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) )
  • 104. Alternating Direction Method of Multipliers min F (x) + G A(x) (?) () min F (x) + G(y) x x,y=Ax A : RN ! RP injective. Lagrangian: min max L(x, y, u) = F (x) + G(y) + hu, y Axi x,y u Augmented: min max L (x, y, u) = L(x, y, u) + ||y Ax||2 x,y u 2 (`+1) x = argminx L (x, y (`) , u(`) ) ADMM y (`+1) = argminy L (x(`+1) , y, u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) ) Theorem: If > 0, x( ) x a solution of ( ) [Gabay, Mercier, Glowinski, Marrocco, 76]
  • 105. ADMM with Proximal Operators Proximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2
  • 106. ADMM with Proximal Operators Proximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2 Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ )
  • 107. ADMM with Proximal Operators Proximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2 Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ ) x(`+1) = ProxA (y (`) F/ u(`) ) ADMM y (`+1) = ProxG/ (Ax(`+1) + u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) )
  • 108. ADMM with Proximal Operators Proximal mapping for metric A: (A is injective) A 1 Prox F = argmin ||Ax z||2 + F (x) x 2 Proposition: ProxAF = A+ Id ProxF ⇤ A⇤ / (·/ ) x(`+1) = ProxA (y (`) F/ u(`) ) ADMM y (`+1) = ProxG/ (Ax(`+1) + u(`) ) u(`+1) = u(`) + (y (`+1) Ax(`+1) ) ! If G A is simple: use DR. ! If F ⇤ A⇤ is simple: use ADMM.
  • 109. ADMM vs. DR Fenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x u Important: no bijection between u and x.
  • 110. ADMM vs. DR Fenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x u Important: no bijection between u and x. Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM. [Eckstein, Bertsekas, 92]
  • 111. ADMM vs. DR Fenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x u Important: no bijection between u and x. Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM. [Eckstein, Bertsekas, 92] DR iterations (when ↵ = 1): (`+1) 1 (`) 1 z = z + RProx F⇤ A⇤ RProx G⇤ (z (`) ) 2 2
  • 112. ADMM vs. DR Fenchel-Rockafellar duality: min F (x) + G A(x) ! min F ⇤ ( A⇤ u) + G⇤ (u) x u Important: no bijection between u and x. Proposition: DR applied to F ⇤ A⇤ + G⇤ is ADMM. [Eckstein, Bertsekas, 92] DR iterations (when ↵ = 1): (`+1) 1 (`) 1 z = z + RProx F ⇤ A⇤ RProx G⇤ (z (`) ) 2 2 The iterates of ADMM are recovered using: (`) 1 (`) y = (z u(`) ) x(`+1) = ProxA (y (`) u(`) ) F/ u(`) = Prox G⇤ (z (`) )
  • 113. More than 2 Functionals min G1 (x) + . . . + Gk (x) each Fi is simple x min G(x1 , . . . , xk ) + C (x1 , . . . , xk ) x G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk ) C = (x1 , . . . , xk ) Hk x1 = . . . = xk
  • 114. More than 2 Functionals min G1 (x) + . . . + Gk (x) each Fi is simple x min G(x1 , . . . , xk ) + C (x1 , . . . , xk ) x G(x1 , . . . , xk ) = G1 (x1 ) + . . . + Gk (xk ) C = (x1 , . . . , xk ) Hk x1 = . . . = xk G and C are simple: Prox G (x1 , . . . , xk ) = (Prox Gi (xi ))i 1 Prox ⇥C (x1 , . . . , xk ) = (˜, . . . , x) x ˜ where x = ˜ xi k i
  • 115. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 116. GFB Splitting n min F (x) + Gi (x) ( ) x RN i=1 i = 1, . . . , n, Smooth Simple (⇥+1) (⇥) (⇥) zi = zi + Proxn G (2x (⇥) zi F (x(⇥) )) x(⇥) n 1 ( +1) x( +1) = zi n i=1 [Raguet, Fadili, Peyr´ 2012] e
  • 117. GFB Splitting n min F (x) + Gi (x) ( ) x RN i=1 i = 1, . . . , n, Smooth Simple (⇥+1) (⇥) (⇥) zi = zi + Proxn G (2x (⇥) zi F (x(⇥) )) x(⇥) n 1 ( +1) x( +1) = zi n i=1 Theorem: Let F be L-Lipschitz. If < 2/L, x( ) x a solution of ( ) [Raguet, Fadili, Peyr´ 2012] e
  • 118. GFB Splitting n min F (x) + Gi (x) ( ) x RN i=1 i = 1, . . . , n, Smooth Simple (⇥+1) (⇥) (⇥) zi = zi + Proxn G (2x (⇥) zi F (x(⇥) )) x(⇥) n 1 ( +1) x( +1) = zi n i=1 Theorem: Let F be L-Lipschitz. If < 2/L, x( ) x a solution of ( ) [Raguet, Fadili, Peyr´ 2012] e n=1 Forward-backward. F =0 Douglas-Rachford.
  • 119. GFB Fix Point x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0
  • 120. GFB Fix Point x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0 1 (zi )n , i=1 i, x zi F (x ) ⇥Gi (x ) n x = 1 n i zi (use zi = x F (x ) N yi )
  • 121. GFB Fix Point x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0 1 (zi )n , i=1 i, x zi F (x ) ⇥Gi (x ) n x = 1 n i zi (use zi = x F (x ) N yi ) (2x zi F (x )) x n ⇥Gi (x ) x⇥ = Proxn Gi (2x⇥ zi F (x⇥ )) zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥
  • 122. GFB Fix Point x argmin F (x) + i Gi (x) 0 F (x ) + i Gi (x ) x RN yi Gi (x ), F (x ) + i yi =0 1 (zi )n , i=1 i, x zi F (x ) ⇥Gi (x ) n x = 1 n i zi (use zi = x F (x ) N yi ) (2x zi F (x )) x n ⇥Gi (x ) x⇥ = Proxn Gi (2x⇥ zi F (x⇥ )) zi = zi + Proxn G (2x⇥ zi F (x⇥ )) x⇥ + Fix point equation on (x , z1 , . . . , zn ).
  • 123. Block Regularization 1 2 block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2 m b B m b iments Towards More Complex Penalization (2) Bk 2 + ` 1 `2 4 k=1 x 1,2 b B1 i b xi ⇥ x⇥⇥1 = i ⇥xi ⇥ b B i b xi2 + i b xi N: 256 b B2 b B Image f = x Coe cients x.
  • 124. Block Regularization 1 2 block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2 m b B m b iments Towards More Complex Penalization Non-overlapping decomposition: B = B ... B Towards More Complex Penalization Towards More Complex Penalization n 1 n 2 G(x) =4 x iBk (2) + ` ` k=1 G 1,2 (x) Gi (x) = ||x[b] ||, 1 2 i=1 b Bi b b 1b1 B1 i b xiixb xi 22 BB ⇥ x⇥x⇥x⇥⇥1 =i ⇥x⇥x⇥xi ⇥ ⇥= ++ + i b i ⇥ ⇥1 ⇥1 = i i ⇥i i ⇥ bb B B i Bb xii2bi2xi2 bbx i N: 256 b b 2b2 B2 i BB xi2 b2xi b b xi i b B Image f = x Coe cients x. Blocks B1 B1 B2
  • 125. Block Regularization 1 2 block sparsity: G(x) = ||x[b] ||, ||x[b] ||2 = x2 m b B m b iments Towards More Complex Penalization Non-overlapping decomposition: B = B ... B Towards More Complex Penalization Towards More Complex Penalization n 1 n 2 G(x) =4 x iBk (2) + ` ` k=1 G 1,2 (x) Gi (x) = ||x[b] ||, 1 2 i=1 b Bi Each Gi is simple: b b 1b1 B1 i b xiixb xi BB 22 ⇥ x⇥x⇥x⇥⇥1 =i ⇥xG ⇥xi ⇥ m = b B B i b xii2bi2xi2 ⇥ ⇥1 = i ⇥i i x + i b i ⇤ m ⇥ b ⇥ Bi , ⇥ ⇥1Prox i ⇥xi ⇥(x) b max i0, 1 = Bb bx ++m N: 256 ||x[b]b||B xi2 b2xi 2 2 B2 b B b i b b xi i b B Image f = x Coe cients x. Blocks B1 B1 B2
  • 126. Deconv. + Inpaint. 2min+CP Y ⇥ P K x CP Y + P 1 K2 Deconv. x 2Inpaint. min 2 ⇥ ` ` x x k=1 x+1,2` k=1 log10(E−E 2 1 `2 Numerical Illustration log10(E− 1 Numerical Experiments Experiments 1 Numerical 1 TI (2)`2 4 0 ||y x 1 ⇥x||368s PRx 2 minix(x)Y ⇥ K x= + `wavelets x Bk 2 0 : 283s; tPR: 298s; tCP:: 283s; t : 298s; t (2) Deconvolution min 2 Y ⇥ K tmin −1 EFB x 102 Deconvolution +GCP: 1` 4 −1 tEFB 2 + 10 40 20 368s 30 1 2 2 40k=1 ` x 1,2 1 k=1 20 30 3 iteration 3 # i EFB iteration # EFB log10(E−Emin) log10(E−Emin) PR PR 2 = convolution 2 = inpainting+convolution l1/l2 l1/l2 : 1.30e−03; CPλ2 : 1.30e−03; CP 2 λ tPR: 173s; tCP 190s noise: 0.025; convol.::it. #50; SNR: 22.49dB #50; SNR: 22.49dB tEFB: 161s; tPR: 173s; tCP N: 256 tEFB: 161s; noise: 0.025; :convol.: 2 190s 1 Numerical Experiments 2 1 EFB it. N: 256 EFB (4) Bk Y ⇥P K + 0 0 log10(E−Emin) 3 3 1 PR 2 PR 16 onv. + Inpaint. minx 2 CP 2 30 2 x CP `140`2 k=1 x 1,2 10 20 10 40 20 30 1 iteration # 1 iteration # 0 0 λ4 : 1.00e−03; λ4 : 1.00e−03; l1/l2 l1/l2 tEFB: 283s; tPR: 298s; tCP: 368s −1 noise: 0.025; degrad.: 0.4; 0.025; degrad.: 0.4; convol.: 2 noise: convol.: 2 −1 it. #50; SNR: 21.80dB #50; SNR: 21.80dB it. 10 20 iteration # 30 EFB 40 10 20 iteration # 30 40 x0 3 PR min 2 CP λ2 : 1.30e−03; λ2 : 1.30e−03; l1/l2 l1/l2 1 noise: 0.025; convol.: 2 noise: 0.025; it. #50; SNR: 22.49dB convol.: 2 it. #50; SNR: 22.49dB 10 0 log10 10 20 (E(x( ) ) # iteration 30 E(x )) y = x0 + w 40 x 4
  • 127. Overview • Optimal Transport and Imaging • Convex Analysis and Proximal Calculus • Forward Backward • Douglas Rachford and ADMM • Generalized Forward-Backward • Primal-Dual Schemes
  • 128. Primal-dual Formulation Fenchel-Rockafellar duality: A:H⇥ L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2 x2H x u2L
  • 129. Primal-dual Formulation Fenchel-Rockafellar duality: A:H⇥ L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2 x2H x u2L Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 )) (min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 u x = max G⇤ (u) 2 G⇤ ( 1 A⇤ u) u
  • 130. Primal-dual Formulation Fenchel-Rockafellar duality: A:H⇥ L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2 x2H x u2L Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 )) (min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 u x = max G⇤ (u) 2 G⇤ ( 1 A⇤ u) u Recovering x? from some u? : x? = argmin G1 (x? ) + hx? , A⇤ u? i x
  • 131. Primal-dual Formulation Fenchel-Rockafellar duality: A:H⇥ L linear min G1 (x) + G2 A(x) = min G1 (x) + sup hAx, ui G⇤ (u) 2 x2H x u2L Strong duality: 0 2 ri(dom(G2 )) A ri(dom(G1 )) (min $ max) = max G⇤ (u) + min G1 (x) + hx, A⇤ ui 2 u x = max G⇤ (u) 2 G⇤ ( 1 A⇤ u) u Recovering x? from some u? : x? = argmin G1 (x? ) + hx? , A⇤ u? i x () A⇤ u? 2 @G1 (x? ) () x? 2 (@G1 ) 1 ( A⇤ u? ) = @G⇤ ( A⇤ s? ) 1
  • 132. Forward-Backward on the Dual If G1 is strongly convex: r2 G1 > cId c G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2 2
  • 133. Forward-Backward on the Dual If G1 is strongly convex: r2 G1 > cId c G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2 2 x? uniquely defined. x? = rG? ( A⇤ u? ) 1 G? is of class C 1 . 1
  • 134. Forward-Backward on the Dual If G1 is strongly convex: r2 G1 > cId c G1 (tx + (1 t)y) 6 tG1 (x) + (1 t)G1 (y) t(1 t)||x y||2 2 x? uniquely defined. x? = rG? ( A⇤ u? ) 1 G? is of class C 1 . 1 FB on the dual: min G1 (x) + G2 A(x) x2H = min G? ( A⇤ u) + G? (u) 1 2 u2L Smooth Simple ⇣ ⌘ u(`+1) = Prox⌧ G? u(`) + ⌧ A⇤ rG? ( A⇤ u(`) ) 2 1
  • 135. Example: TV Denoising 1 min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2 f RN 2 ||u|| ||u||1 = ||ui || ||u|| = max ||ui || i i Dual solution u Primal solution f = y + div(u ) [Chambolle 2004]
  • 136. Example: TV Denoising 1 min ||f y||2 + ||⇥f ||1 min ||y + div(u)||2 f RN 2 ||u|| ||u||1 = ||ui || ||u|| = max ||ui || i i Dual solution u Primal solution f = y + div(u ) FB (aka projected gradient descent): [Chambolle 2004] u( +1) = Proj||·|| u( ) + (y + div(u( ) )) ui v = Proj||·|| (u) vi = max(||ui ||/ , 1) 2 1 Convergence if < = ||div ⇥|| 4
  • 137. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H () min max G1 (x) G⇤ (z) + hA(x), zi 2 x z
  • 138. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H () min max G1 (x) G⇤ (z) + hA(x), zi 2 x z z (`+1) = Prox G⇤ 2 (z (`) + A(˜(`) ) x x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) )) ˜ x( +1) = x( +1) + (x( +1) x( ) ) = 0: Arrow-Hurwicz algorithm. = 1: convergence speed on duality gap.
  • 139. Primal-Dual Algorithm min G1 (x) + G2 A(x) x H () min max G1 (x) G⇤ (z) + hA(x), zi 2 x z z (`+1) = Prox G⇤ 2 (z (`) + A(˜(`) ) x x(⇥+1) = Prox G1 (x(⇥) A (z (⇥) )) ˜ x( +1) = x( +1) + (x( +1) x( ) ) = 0: Arrow-Hurwicz algorithm. = 1: convergence speed on duality gap. Theorem: [Chambolle-Pock 2011] If 0 1 and ⇥⇤ ||A||2 < 1 then x( ) x minimizer of G1 + G2 A.
  • 140. Example: Optimal Transport Staggered grid formulation : min 1 2 J(I(x)) + ◆C (x) x2RGst ⇥RGst 1 2 Gst Gst 1 2 I = (I , I ) : R ⇥R ! RG c s s I t t Staggered grid Centered grid Gc 1 2 Gst Gst
  • 141. Conclusion Inverse problems in imaging: Large scale, N 106 . Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. Highly structured (separability, p norms, . . . ).
  • 142. Conclusion Inverse problems in imaging: Large scale, N 106 . Towards More Complex Penalization Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. b B1 i b xi 2 ⇥ x⇥⇥1 = i ⇥xi ⇥ b B 2 i p xi + Highly structured (separability, b norms, . . . ). b B2 i b xi2 Proximal splitting: Unravel the structure of problems. Parallelizable. Decomposition G = k Gk
  • 143. Conclusion Inverse problems in imaging: Large scale, N 106 . Towards More Complex Penalization Non-smooth (sparsity, TV, . . . ) (Sometimes) convex. b B1 i b xi 2 ⇥ x⇥⇥1 = i ⇥xi ⇥ b B 2 i p xi + Highly structured (separability, b norms, . . . ). b B2 i b xi2 Proximal splitting: Unravel the structure of problems. Parallelizable. Open problems: Decomposition G = k Gk Less structured problems without smoothness. Non-convex optimization.