Mixed-Derivative Total Variation

Vincent Guillemet, and Michael Unser

Abstract

The formulation of norms on continuous-domain Banach spaces with exact pixel-based discretization is advantageous for solving inverse problems (IPs). In this paper, we investigate a new regularization that is a convex combination of a TV term and the $\mathcal{M}(\mathbb{R}^{2})$ norm of mixed derivatives. We show that the extreme points of the corresponding unit ball are indicator functions of polygons whose edges are aligned with either the $x_{1}$ - or $x_{2}$ -axis. We then apply this result to construct a new regularization for IPs, which can be discretized exactly by tensor products of first-order B-splines, or equivalently, pixels. Furthermore, we exactly discretize the loss of the denoising problem on its canonical pixel basis and prove that it admits a unique solution, which is also a solution to the underlying continuous-domain IP.

Index Terms:

Geometric measure theory, continuous-domain inverse problems, tensor product, representer theorem, denoising.

I Introduction

The study of continuous-domain formulations of inverse problems (IP) is crucial in computational imaging [1, 2]. In this setting, the objective is to recover an unknown image $f:\mathbb{R}^{2}\to\mathbb{R}$ from noisy measurements. Due to the pixel-based geometry of images, such problems are commonly addressed by seeking an approximate solution $f^{\star}$ in

\mathcal{X}_{1}=\left\{f=\sum_{\mathbf{k}\in\mathbb{Z}^{2}}\mathbf{a}[\mathbf{k}]\mathbbm{1}_{E_{\mathbf{k}}}\right\},

(1)

where the sets $(E_{\mathbf{k}})_{\mathbf{k}\in\mathbb{Z}^{2}}$ form a pixel-based decomposition of $\mathbb{R}^{2}$ . In several acquisition modalities, the forward measurement process is not naturally associated with a canonical pixel basis, in which case $\mathcal{X}_{1}$ must be designed by the practitioner. In this context, it is advantageous to identify a continuous-domain Banach space $\mathcal{X}$ that contains all pixel-type basis functions and within which $\mathcal{X}_{1}$ can be refined. The search space $\mathcal{X}$ is typically selected jointly with the regularization functional $\text{Reg}(\cdot)$ [3, 4], which encodes prior knowledge about the geometry of $f^{\star}$ when the IP is ill-posed.

I-A TV Regularization

A standard regularization for the pixel basis is the total variation (TV) [5], which, for $f\in\mathcal{X}=\mathrm{BV}(\Omega)$ [6, 7] and in its anisotropic form, is given by

\left\lVert\nabla\{f\}\right\rVert_{\mathcal{M}(\mathbb{R}^{2})^{2}}=\\ \left(\left\lVert[\mathrm{I}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}(\mathbb{R}^{2})}+\left\lVert[\mathrm{D}\otimes\mathrm{I}]\{f\}\right\rVert_{\mathcal{M}(\mathbb{R}^{2})}\right),

(2)

where $\left\lVert\cdot\right\rVert_{\mathcal{M}(\mathbb{R}^{2})}$ denotes the norm on the space of bounded measures [8]. This norm has the advantage of being exactly discretizable on the pixel basis, which makes it particularly suitable for image processing. Specifically, when $f\in\mathcal{X}_{1}$ is a linear combination of square pixels with weights $\mathbf{a}$ , (2) can be discretized as

c\left(\left\lVert\bm{h}_{0,1}\ast\mathbf{a}\right\rVert_{\ell_{1}}+\left\lVert\bm{h}_{1,0}\ast\mathbf{a}\right\rVert_{\ell_{1}}\right),

(3)

where $c$ is determined by the pixel size, and

\bm{h}_{0,1}=\begin{bmatrix}-1\\ 1\end{bmatrix},\quad\bm{h}_{1,0}=\begin{bmatrix}-1&1\end{bmatrix}.

(4)

The search for the optimal $f^{\star}\in\mathcal{X}_{1}$ thus reduces to a compressed sensing problem [9, 10, 11], which can be solved using classical algorithms such as FISTA [12] or primal-dual methods [13, 14, 15, 16]. One reason for the popularity of TV regularization is the strong theoretical foundation on which it is built.

The norm $\left\lVert\nabla\{\cdot\}\right\rVert_{\mathcal{M}(\mathbb{R}^{2})^{2}}$ is not only well defined for pixels, but also for indicator functions of Caccioppoli sets [17, 18], i.e., sets that can be approximated in measure by polygons. Within the framework of representer theorems (RTs) for Banach spaces [19, 20, 21, 22], Bredies and Carioni [23] built upon the results of [24, 25, 26] to show that TV regularization promotes solutions to inverse problems that are finite linear combinations of simple sets, Caccioppoli sets whose boundaries are represented by a unique Jordan curve [26].

In contrast to the spline solutions promoted by the one-dimensional TV counterpart $\left\lVert\mathrm{D}\{\cdot\}\right\rVert_{\mathcal{M}(\mathbb{R})}$ , the generality of these simple sets prevents them from benefiting from the convenient spline structure [27, 28, 29, 30, 31].

I-B Mixed-Derivative Regularization

One approach to promoting pixel-type solutions with a spline structure is to impose additional regularity. In [32], the authors considered the search space $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D}}(\mathrm{K})$ , consisting of functions $f$ whose generalized gradient $\nabla\{f\}$ and generalized mixed derivative $[\mathrm{D}\otimes\mathrm{D}]\{f\}$ are bounded measures supported in $\mathrm{K}=[0,1]\times[0,1]$ . They showed that the extreme points of the unit ball associated with the norm $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}(\mathbb{R}^{2})}$ are tensor products of $\mathrm{D}$ -splines. However, this formulation is not entirely satisfactory because the regularization $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}(\mathbb{R}^{2})}$ does not control the perimeter of sets, but only their number of corners. As a result, it may promote solutions containing components that are undesirable, such as indicators of unbounded sets.

I-C Contributions

In this paper, for $\theta\in]0,1]$ , we regularize with the new norm

\left\lVert\cdot\right\rVert_{\theta}=\theta\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}(\mathbb{R}^{2})}+(1-\theta)\left\lVert\nabla\{\cdot\}\right\rVert_{\mathcal{M}(\mathbb{R}^{2})^{2}},

(5)

which generalizes the anisotropic TV. Our contributions are fourfold.

Functional Framework: We define the search space $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ as the set of functions $f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D}}(\mathrm{K})$ supported in $\mathrm{K}$ . We prove that this space is a dual Banach space for the norm $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}}$ , and we establish that all $\left\lVert\cdot\right\rVert_{\theta}$ -norms are equivalent.

Extreme Point Characterization: We show that the extreme points of the unit ball of the convex cone of positive functions $\left(\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K}),\left\lVert\cdot\right\rVert_{\theta}\right)$ are indicator functions of polygons whose edges are parallel to either the $x_{1}$ - or $x_{2}$ -axis. We then apply this result to the regularization of abstract IPs with finite data.

Resolution of IPs: We demonstrate that tensor-product B-splines of order 1 exactly discretize a broad class of IP loss functionals, and that $\left\lVert\cdot\right\rVert_{\theta}$ can be discretized exactly as

\theta c\left\lVert\bm{h}_{0,1}\ast\mathbf{a}\right\rVert_{\ell_{1}}+\theta c\left\lVert\bm{h}_{1,0}\ast\mathbf{a}\right\rVert_{\ell_{1}}+(1-\theta)\left\lVert\bm{h}_{1,1}\ast\mathbf{a}\right\rVert_{\ell_{1}},

(6)

where $\bm{h}_{1,1}=\bm{h}_{0,1}\otimes\bm{h}_{1,0}$ . We further prove that these exact discretizations converge, in an appropriate sense, to the continuous-domain IP as the pixel size tends to zero.

Application to Denoising: We prove that the canonical pixel-based discretization of the denoising loss admits a unique solution, which also solves the corresponding continuous-domain problem. Finally, we numerically assess the efficiency of the proposed method.

I-D Related Works

Higher-Order Generalization of TV. Bredies et al. introduced the generalized total variation (GTV) [33, 34, 23], which incorporates all derivatives up to a specified order $k$ . In particular, the second-order TV ( $k=2$ ) is defined on the space of functions with bounded Hessian [35] and, for a smooth function $f$ , takes the form

TV^{(2)}\{f\}=\int_{\mathbb{R}^{2}}\left\lVert\nabla^{2}\{f\}(\mathbf{x})\right\rVert_{F}\,\mathrm{d}\mathbf{x},

(7)

where $\left\lVert\cdot\right\rVert_{F}$ denotes the Frobenius norm. Lefkimiatis et al. extended this notion by introducing the Hessian TV (HTV), in which the Frobenius norm is replaced by a Schatten- $p$ norm [36, 37, 38]. Regularization with HTV has been employed for the learning of continuous piecewise-linear (CPWL) mappings [39, 40, 41]. Ambrosio et al. showed that the closure, in the unit ball of the space of functions with finite HTV, of the set of CPWL extreme points contains all extremal points [42, 43].

Our regularization lies, in terms of regularity constraints, between TV and HTV. Specifically, it enforces more regularity than TV due to the presence of the $\mathrm{D}\otimes\mathrm{D}$ term, but less than HTV because it omits the terms $\mathrm{I}\otimes\mathrm{D}^{2}$ and $\mathrm{D}^{2}\otimes\mathrm{I}$ . In particular, HTV is not well defined on pixel basis.

Extreme Points of Mixed Norms. We investigate the extreme points of the unit ball associated with our regularization norm, which takes the form

\left\lVert\cdot\right\rVert=\theta\left\lVert\cdot\right\rVert_{A}+(1-\theta)\left\lVert\cdot\right\rVert_{B},

where $\left\lVert\cdot\right\rVert_{A}$ and $\left\lVert\cdot\right\rVert_{B}$ are two norms whose extreme points are known. When the search space $\mathcal{X}$ can be decomposed as a direct sum $\mathcal{X}_{A}\oplus\mathcal{X}_{B}$ , with $\left\lVert\cdot\right\rVert_{A}|_{\mathcal{X}_{B}}=0$ and $\left\lVert\cdot\right\rVert_{B}|_{\mathcal{X}_{A}}=0$ , the extreme points of $\left\lVert\cdot\right\rVert$ have been systematically characterized [44]. However, when such a direct-sum decomposition is not possible, as is the case in this work, the description of the extreme points becomes significantly more challenging. In finite dimensions, they are located at the intersections of the hyperplanes that define the boundaries of the unit balls [45, 46, 47]. To the best of our knowledge, no analogous characterization is currently available in infinite-dimensional settings.

Continuous Data. TV was originally introduced as a regularization for the denoising loss $\left\lVert f-\cdot\right\rVert_{\mathcal{L}_{2}(\Omega)}$ , where $\Omega$ is a domain and $f$ denotes the continuous data [5]. Subsequent works have deepened the theoretical understanding of this approach. The existence and uniqueness of solutions for this problem, as well as for the TV flow problem, were established in [48, 49, 50]. The regularity of solutions was analysed in [51], while discretization and convergence properties, including convergence rates, were investigated in [52, 53, 54]. Chambolle and Pock provide a comprehensive overview of TV discretization and convergence techniques in [55]. In the present work, we focus on the study of IPs with finite data.

II Theoretical Formulation

II-A Mixed-Derivative Native Space

Let $\mathrm{K}=[0,1]^{2}$ denote the unit square. We define the native space $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ as

\left\{f\in\mathcal{D}^{\prime}(\mathbb{R}^{2}):\quad\begin{cases}[\mathrm{D}\otimes\mathrm{D}]\{f\}\in\mathcal{M}(\mathrm{K}),\\ \forall\phi\in\mathcal{C}_{c}^{\infty}(\mathrm{K}^{c}),\ \langle f,\phi\rangle=0\end{cases}\right\},

(8)

where $\mathcal{M}(\mathrm{K})$ denotes the space of bounded measures supported in $\mathrm{K}$ , and $\mathrm{K}^{c}=\mathbb{R}^{2}\setminus\mathrm{K}$ . The second condition in (8) enforces that the function $f$ vanishes outside $\mathrm{K}$ . In Proposition 1, we provide a more practical, integral-based characterization of the functions $f$ belonging to $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ . To this end, we introduce the integral transform $[\mathrm{D}\otimes\mathrm{D}]^{-1}\{\cdot\}$ , defined for all $m\in\mathcal{M}(\mathrm{K})$ by

	$\displaystyle[\mathrm{D}\otimes\mathrm{D}]^{-1}\{m\}$	$\displaystyle=\int_{\mathrm{K}}u(\cdot-x_{1})\,u(\cdot-x_{2})\,\mathrm{d}m(x_{1},x_{2})$
		$\displaystyle=(u\otimes u)\ast m,$		(9)

where $u$ is the heaviside function.

Proposition 1.

A function $f$ belongs to $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ if and only if it belongs to

\left\{f\in[\mathrm{D}\otimes\mathrm{D}]^{-1}\{\mathcal{M}(\mathrm{K})\}:\forall\phi\in\mathcal{C}_{c}^{\infty}(\mathrm{K}^{c}),\ \langle f,\phi\rangle=0\right\}.

(10)

Proof of Proposition 1.

Clearly, if $f$ satisfies (10), then $f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ . To prove the converse, assume $f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ and define $m=[\mathrm{D}\otimes\mathrm{D}]\{f\}$ . We claim that

[\mathrm{D}\otimes\mathrm{D}]^{-1}\{m\}=f

(11)

which proves the proposition. Observe that $[\mathrm{D}\otimes\mathrm{D}]^{-1}\{m\}$ may differ from $f$ only by an element of the null space of $\mathrm{D}\otimes\mathrm{D}$ in $\mathcal{D}^{\prime}(\mathbb{R}^{2})$ . Hence, one can write

f=[\mathrm{D}\otimes\mathrm{D}]^{-1}\{m\}+\phi_{1}\otimes 1+1\otimes\phi_{2},

(12)

for some $(\phi_{1},\phi_{2})\in\mathcal{D}^{\prime}(\mathbb{R})^{2}$ . Note that

	$\displaystyle\text{supp}(m)\subset\mathrm{K}\Rightarrow$	$\displaystyle\text{supp}\left([\mathrm{D}\otimes\mathrm{D}]^{-1}\{m\}\right)\subset[0,\infty[^{2}$
	$\displaystyle\Rightarrow$	$\displaystyle\text{supp}(\phi_{1}\otimes 1+1\otimes\phi_{2})\subset[0,\infty[^{2},$		(13)

which directly implies that $\phi_{1}\otimes 1+1\otimes\phi_{2}=0$ and proves the claim. ∎

Let $\theta\in]0,1]$ , we equip $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ with the mixed norm $\left\lVert\cdot\right\rVert_{\theta}$ defined as

\left\lVert\cdot\right\rVert_{\theta}=\theta\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}(\mathrm{K})}+(1-\theta)\left\lVert\nabla\{\cdot\}\right\rVert_{\mathcal{M}(\mathrm{K})^{2}},

(14)

with

\left\lVert\nabla\{\cdot\}\right\rVert_{\mathcal{M}(\mathrm{K})^{2}}=\left\lVert[\mathrm{I}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}(\mathrm{K})}+\left\lVert[\mathrm{D}\otimes\mathrm{I}]\{\cdot\}\right\rVert_{\mathcal{M}(\mathrm{K})}.

(15)

In the sequel, we write $\left\lVert\cdot\right\rVert_{\mathcal{M}}$ instead of $\left\lVert\cdot\right\rVert_{\mathcal{M}(\mathrm{K})}$ and $\left\lVert\cdot\right\rVert_{\mathcal{M}^{2}}$ instead of $\left\lVert\cdot\right\rVert_{\mathcal{M}(\mathrm{K})^{2}}$ . The key properties of $\left(\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K}),\left\lVert\cdot\right\rVert_{\theta}\right)$ are summarised in Theorem 1.

Theorem 1.

Let $\theta\in]0,1]$ . Then, the vector space $\big(\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K}),\left\lVert\cdot\right\rVert_{\theta}\big)$ satisfies the following properties:

All norms $\left\lVert\cdot\right\rVert_{\theta}$ are well-defined and equivalent; that is, for all $(\theta,\theta^{\prime})\in]0,1]^{2}$ , there exists a constant $C>0$ such that

\forall f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K}):\quad\left\lVert f\right\rVert_{\theta}\leq C\left\lVert f\right\rVert_{\theta^{\prime}}.

(16)

2.

It is a Banach space.

Equipped with the norm $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}}$ , it is the dual space of $[\mathrm{D}\otimes\mathrm{D}]\{\mathcal{C}_{0}(\mathbb{R}^{2})\}/M_{0}$ , where

M_{0}=\big\{v\in[\mathrm{D}\otimes\mathrm{D}]\{\mathcal{C}_{0}(\mathbb{R}^{2})\}:\\ \forall f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K}),\langle f,v\rangle=0\big\},

(17)

and $[\mathrm{D}\otimes\mathrm{D}]\{\mathcal{C}_{0}(\mathbb{R}^{2})\}$ is equipped with the norm

\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\phi\}\right\rVert=\left\lVert\phi\right\rVert_{\infty(\mathbb{R}^{2})}.

(18)

Proof of Theorem 1.

Item 1. Let $m=[\mathrm{D}\otimes\mathrm{D}]\{f\}$ and $f=(u\otimes u)\ast m$ . Then,

		$\displaystyle\left\lVert[\mathrm{I}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}=\left\lVert(u\otimes 1)\ast m\right\rVert_{\mathcal{M}}\leq\|\mathrm{K}\|\left\lVert m\right\rVert_{\mathcal{M}}$
	$\displaystyle\Rightarrow$	$\displaystyle\left\lVert\nabla\{f\}\right\rVert_{\mathcal{M}}\leq 2\|\mathrm{K}\|\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}$		(19)

where $|\mathrm{K}|=\int_{\mathrm{K}}1\mathrm{d}\mathbf{x}.$ Consequently, (16) holds with

	$\displaystyle C=\text{max}\left(\frac{\theta}{\theta^{\prime}},\frac{1-\theta}{1-\theta^{\prime}}\right),$	$\displaystyle\quad\text{for}\quad\theta^{\prime}\in]0,1[$		(20)
	$\displaystyle C=(\theta+(1-\theta)\|\mathrm{K}\|2),$	$\displaystyle\quad\text{for}\quad\theta^{\prime}=1.$		(21)

Item 2. We know from [32] that $[\mathrm{D}\otimes\mathrm{D}]^{-1}\{\mathcal{M}(\mathrm{K})\}$ is a Banach space for the norm $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}}$ . Since $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ is a closed subspace of $[\mathrm{D}\otimes\mathrm{D}]^{-1}(\mathcal{M}(\mathrm{K}))$ , it is also Banach space for $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}}$ . By the norm equivalence, it follows that it is a Banach space for $\left\lVert\cdot\right\rVert_{\theta}$ .
Item 3. Here, all quotient spaces are equipped with their quotient norms. We know from [32] that $[\mathrm{D}\otimes\mathrm{D}]^{-1}\{\mathcal{M}(\mathrm{K})\}$ equipped with $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}}$ is the dual space of $[\mathrm{D}\otimes\mathrm{D}]\{\mathcal{C}_{0}(\mathbb{R}^{2})\}/M$ , with

M=\{v\in[\mathrm{D}\otimes\mathrm{D}]\{\mathcal{C}_{0}\}(\mathbb{R}^{2}):\\ \forall f\in[\mathrm{D}\otimes\mathrm{D}]^{-1}\{\mathcal{M}(\mathrm{K})\},\langle f,v\rangle=0\}.

(22)

Since $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ is a Banach subspace of $[\mathrm{D}\otimes\mathrm{D}]^{-1}\{\mathcal{M}(\mathrm{K})\}$ , the predual of $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ is given by the (pre-)annihilator

\left([\mathrm{D}\otimes\mathrm{D}]\{\mathcal{C}_{0}(\mathbb{R}^{2})\}/M\right)/M_{0}=[\mathrm{D}\otimes\mathrm{D}]\{\mathcal{C}_{0}(\mathbb{R}^{2})\}/M_{0}.

(23)

∎

The key takeaway of Theorem 1 is that all $\theta$ -norms, for $\theta>0$ , are topologically equivalent on $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ . We have shown that $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ also admits a $\mathrm{weak}^{\star}$ topology, such that $(f_{n})_{n=1}^{\infty}$ is $\mathrm{weak}^{\star}$ -convergent to $f$ in $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ if and only if

\forall[v]\in[\mathrm{D}\otimes\mathrm{D}]\{\mathcal{C}_{0}(\mathbb{R}^{2})\}/M_{0}:\quad\lim_{n\to\infty}\langle f_{n},[v]\rangle=\langle f,[v]\rangle.

(24)

Observe that, even though the quotient seems to complicate the predual structure, one still has that $(f_{n})_{n=1}^{\infty}$ is $\mathrm{weak}^{\star}$ -convergent to $f$ if and only if

\forall v\in[\mathrm{D}\otimes\mathrm{D}]\{\mathcal{C}_{0}(\mathbb{R}^{2})\}:\quad\lim_{n\to\infty}\langle f_{n},v\rangle=\langle f,v\rangle,

(25)

because changing the representative in the quotient class, by definition, does not change the value of the brackets.

Next, we discuss the relation of $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ with the space $\mathrm{BV}(\Omega)$ of functions of bounded variation, which is naturally equipped with the norm $\left\lVert\nabla\{\cdot\}\right\rVert_{\mathcal{M}^{2}}$ . We recall that, for a domain $\Omega$ ,

\mathrm{BV}(\Omega)=\{f\in\mathcal{L}_{1}(\Omega):\nabla\{f\}\in\mathcal{M}(\Omega)^{2}\}.

(26)

Proposition 2.

For a domain $\Omega$ such that $\mathrm{K}\subset\Omega$ , the following holds:

1.

the functional $\left\lVert\nabla\{\cdot\}\right\rVert_{\mathcal{M}^{2}}$ is a norm on $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ ;
2.

the space $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ is a subspace of $\mathrm{BV}(\Omega)$ , i.e.

$\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})\subset\mathrm{BV}(\Omega);$ (27)
3.

the vector space $\big(\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K}),\left\lVert\nabla\{\cdot\}\right\rVert_{\mathcal{M}^{2}}\big)$ is not closed.

Proof of Proposition 2.

Item 1. Absolute homogeneity, the triangle inequality, and non-negativity are straightforward. In addition,

\left\lVert\nabla\{f\}\right\rVert_{\mathcal{M}^{2}}=0\Leftrightarrow f=c

(28)

for some $c\in\mathbb{R}$ . Since $f$ is compactly supported, one has that $c=0$ .
Item 2. It follows from Item 1 of Theorem 1 that $\nabla\{f\}\in\mathcal{M}(\mathbb{R}^{2})^{2}$ . Because $\mathrm{supp}(\nabla\{f\})\subset\mathrm{K}\subset\Omega$ , we find that $\nabla\{f\}\in\mathcal{M}(\Omega)^{2}$ . The same argument applies to conclude that $f\in\mathcal{L}_{1}(\Omega)$ .
Item 3. Let $\mathrm{K}=[0,2]^{2}$ and $\beta(\cdot)=\mathbbm{1}_{[0,1]}(\cdot)$ . We define

f_{n+1}(\cdot)=\sum_{k=0}^{n}\beta\left(\frac{\cdot}{2^{-k}}\right)\otimes\beta\left(\frac{\cdot-c_{k}}{2^{-k}}\right)

(29)

with $c_{k}=2\left(1-\frac{1}{2^{k}}\right)$ . Then, a straightforward calculation shows that the sequence $(f_{n})_{n=1}^{\infty}$ with $f_{n}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ converges in $\left\lVert\nabla\{\cdot\}\right\rVert_{\mathcal{M}^{2}}$ to $f^{\infty}$ , which is a staircase with an accumulation of steps towards the $x_{2}$ axis. In addition, we calculate that

[\mathrm{D}\otimes\mathrm{D}]\{f^{\infty}\}=\\ \delta_{(0,0)}-\delta_{(1,0)}+\sum_{k=1}^{\infty}\delta_{(2-c_{k},c_{k})}-\delta_{(2-c_{k+1},c_{k})},

(30)

which is an unbounded measure with $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f^{\infty}\}\right\rVert_{\mathcal{M}}=\infty$ . Consequently, $\big(\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K}),\left\lVert\nabla\{\cdot\}\right\rVert_{\mathcal{M}^{2}}\big)$ is not closed. ∎

II-B Sets, Indecomposable Sets and Approximations

We study the geometrical properties of sets that satisfy $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A}\}\right\rVert_{\mathcal{M}}<\infty$ . As a first step, we show that an indicator function $\mathbbm{1}_{A}$ , as an element of $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ , forms an equivalence class.

Proposition 3.

Let $A\subset\mathrm{K}$ be a measurable set. If a measurable set $A^{\prime}\subset\mathrm{K}$ differs from $A$ by a set of measure zero, then

\forall\theta\in]0,1],\quad\left\lVert\mathbbm{1}_{A}-\mathbbm{1}_{A^{\prime}}\right\rVert_{\theta}=0.

(31)

In particular, if $\mathbbm{1}_{A}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ , then $\mathbbm{1}_{A^{\prime}}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ .

Proof of Proposition 3.

We calculate that

	$\displaystyle\forall\phi\in\mathcal{C}_{c}^{\infty}(\mathbb{R}^{2}),\quad\langle\mathbbm{1}_{A}-\mathbbm{1}_{A^{\prime}},[\mathrm{D}\otimes\mathrm{D}]\{\phi\}\rangle=0$	(32)
$\displaystyle\Rightarrow$	$\displaystyle\forall\phi\in\mathcal{C}_{0}(\mathbb{R}^{2}),\quad\langle\mathbbm{1}_{A}-\mathbbm{1}_{A^{\prime}},[\mathrm{D}\otimes\mathrm{D}]\{\phi\}\rangle=0$	(33)
$\displaystyle\Rightarrow$	$\displaystyle\left\lVert\mathbbm{1}_{A}-\mathbbm{1}_{A^{\prime}}\right\rVert_{1}=0$	(34)
$\displaystyle\Rightarrow$	$\displaystyle\left\lVert\mathbbm{1}_{A}-\mathbbm{1}_{A^{\prime}}\right\rVert_{\theta}=0.$	(35)

In (32), we used the assumption that $A^{\prime}$ differs from $A$ by a set of measure zero. In (33) we used the fact that $\mathcal{C}_{c}^{\infty}(\mathbb{R}^{2})$ is dense in $\mathcal{C}_{0}(\mathbb{R}^{2})$ , and that the linear and continuous functional $\langle\mathbbm{1}_{A}-\mathbbm{1}_{A^{\prime}},\cdot\rangle$ admits a unique (linear and continuous) extension to $\mathcal{C}_{0}(\mathbb{R}^{2})$ . In (34) we used Item 3 of Theorem 1. In (35) we used Item 1 of Theorem 1. Finally,

\mathbbm{1}_{A^{\prime}}=\mathbbm{1}_{A}-(\mathbbm{1}_{A}-\mathbbm{1}_{A^{\prime}})\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K}).

(36)

∎

The fact that $\mathbbm{1}_{A}$ forms an equivalence class is often forgotten in the sequel as it is (often) clear from the context which representative to take. Proposition 4 shows that such indicators must have the form of a polygon with finitely many corners and edges parallel to the $x_{1}$ or the $x_{2}$ axis.

Proposition 4.

If $\mathbbm{1}_{A}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ , then $[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A}\}$ is a discrete measure with finitely many atoms. In addition, one has the representation

\mathbbm{1}_{A}=\sum_{k=1}^{K}a_{k}u(\cdot-x_{1,k})u(\cdot-x_{2,k}),\quad\text{with}\quad a_{k}\in\{-1,1\}

(37)

and $(x_{1,k},x_{2,k})\in\mathrm{K}.$

Proof of Proposition 4.

Step 1. We define

	$\displaystyle\mathbbm{1}_{A}(\mathbf{t})$	$\displaystyle=\int_{\mathrm{K}}u(t_{1}-x_{1})u(t_{2}-x_{2})\mathrm{d}m(x_{1},x_{2})$
		$\displaystyle=m([0,t_{1}]\times[0,t_{2}]).$		(38)

Observe that in (38) the value of $\mathbbm{1}_{A}(\mathbf{t})$ is set by measuring the closed square. Other choices, such as $m([0,t_{1}[\times[0,t_{2}[)$ , would yield functions that fall in the same equivalence class as $\mathbbm{1}_{A}.$
Step 2. We assume by contradiction that there exists a set $E$ of the form

\displaystyle E=]e_{1}^{-},e_{1}^{+}]\times]e_{2}^{-},e_{2}^{+}]\cap\mathrm{K},\quad\text{with}\quad(e_{1}^{+},e_{2}^{+})\in\mathrm{K},

(39)

such that $m(E)\notin\mathbb{Z}.$ Observe that, for $E_{0,0}=[0,e_{1}^{-}]\times[0,e_{2}^{-}]\cap\mathrm{K},$ $E_{1,0}=]e_{1}^{-},e_{1}^{+}]\times[0,e_{2}^{-}]\cap\mathrm{K},$ and $E_{0,1}=[0,e_{1}^{-}]\times]e_{2}^{-},e_{2}^{+}]\cap\mathrm{K}$ , one has that

\displaystyle m(E+E_{0,0}+E_{1,0}+E_{0,1})=\mathbbm{1}_{A}(e_{1}^{+},e_{2}^{+})\in\{0,1\}.

(40)

Consequently, we either have that

•

$m(E_{0,0})\notin\mathbb{Z},$ in which case $m(E_{0,0})=\mathbbm{1}_{A}(e_{1}^{-},e_{2}^{-})\in\{0,1\}$ , which is a contradiction;
•
$m(E_{0,1})\notin\mathbb{Z},$ in which case
- –
  
  if $m(E_{0,0})\notin\mathbb{Z}$ , then we find a contradiction as before;
- –
  
  if $m(E_{0,0})\in\mathbb{Z}$ , then $\mathbb{Z}\notowner m(E_{0,1}+E_{0,0})=\mathbbm{1}_{A}(e_{1}^{-},e_{2}^{+})\in\{0,1\}$ which is a contradiction;
•

$m(E_{1,0})\notin\mathbb{Z},$ in which case we reach a contradiction as in the $m(E_{0,1})\notin\mathbb{Z}$ case.

Further, one can prove that the measure $m$

1.

is integer valued on intersections of sets of the form (39);
2.

is integer valued on unions of sets of the form (39);
3.

is integer valued on the relative complement of a set of the form (39).

It follows that $m$ is integer-valued on any Borel set [56, Proposition 1.1.5]. This implies, with the boundedness of $m$ , that [57, Chapters 1-2]

m=\sum_{k=1}^{K}a_{k}\delta_{(x_{1,k},x_{2,k})},\quad a_{k}\in\mathbb{Z}\setminus\{0\}.

(41)

The fact that $\mathbbm{1}_{A}$ is $\{0,1\}$ -valued finally implies that $a_{k}\in\{-1,1\}$ . ∎

One important class is formed by indecomposable sets [23, 26], which we relabel as $\nabla$ -indecomposable in Definition 1. These sets are central in our description of the unit ball of $(\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K}),\left\lVert\cdot\right\rVert_{\theta})$ , as the indecomposability property translates into an extreme point structure. We extend this class to $[\mathrm{D}\otimes\mathrm{D}]$ -indecomposable sets in Definition 2.

Definition 1.

Let $A\subset\mathrm{K}$ be such that $\mathbbm{1}_{A}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ . The set $A$ is $\nabla$ -decomposable if there exists a partition $A_{1},A_{2}$ of $A$ such that, $\mathbbm{1}_{A_{1}}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ , $\mathbbm{1}_{A_{2}}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ ,

\left\lVert\nabla\{\mathbbm{1}_{A_{1}}\}\right\rVert_{\mathcal{M}^{2}}+\left\lVert\nabla\{\mathbbm{1}_{A_{2}}\}\right\rVert_{\mathcal{M}^{2}}=\left\lVert\nabla\{\mathbbm{1}_{A}\}\right\rVert_{\mathcal{M}^{2}}<\infty.

(42)

and $|A_{1}|>0,$ $|A_{2}|>0.$ If the set $A$ is not $\nabla$ -decomposable, it is $\nabla$ -indecomposable. The set of all $\nabla$ -indecomposable sets is denoted by

\mathcal{E}_{\nabla}.

(43)

Definition 2.

Let $A\subset\mathrm{K}$ be such that $\mathbbm{1}_{A}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ . The set $A$ is $[\mathrm{D}\otimes\mathrm{D}]$ -decomposable if there exists a partition $A_{1},A_{2}$ of $A$ such that, $\mathbbm{1}_{A_{1}}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ , $\mathbbm{1}_{A_{2}}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ ,

\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A_{1}}\}\right\rVert_{\mathcal{M}}+\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A_{2}}\}\right\rVert_{\mathcal{M}}=\\ \left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A}\}\right\rVert_{\mathcal{M}}<\infty,

(44)

and $|A_{1}|>0,$ $|A_{2}|>0.$ If the set $A$ is not $[\mathrm{D}\otimes\mathrm{D}]$ -decomposable, it is $[\mathrm{D}\otimes\mathrm{D}]$ -indecomposable. The set of all $[\mathrm{D}\otimes\mathrm{D}]$ -indecomposable sets is denoted by

\mathcal{E}_{\mathrm{D}\otimes\mathrm{D}}.

(45)

The relation between $[\mathrm{D}\otimes\mathrm{D}]$ -indecomposability and $\nabla$ -indecomposability is revealed in Proposition 5.

Proposition 5.

The strict inclusion holds true

\mathcal{E}_{\mathrm{D}\otimes\mathrm{D}}\subset\mathcal{E}_{\nabla}.

(46)

Proof of Proposition 5.

Item 1. We provide an example of a $\nabla$ -indecomposable set $A$ that is not $[\mathrm{D}\otimes\mathrm{D}]$ -indecomposable. To do so, we define

	$\displaystyle\mathbbm{1}_{A_{1}}(t_{1},t_{2})=\beta\left(\frac{t_{1}}{3}\right)\otimes\beta(t_{2})$		(47)
	$\displaystyle\mathbbm{1}_{A_{2}}(t_{1},t_{2})=\beta(t_{1}-1)\otimes\beta(t_{2}-1)$		(48)

as well as $\mathbbm{1}_{A}=\mathbbm{1}_{A_{1}}+\mathbbm{1}_{A_{2}}$ . On the one hand, $A$ is $[\mathrm{D}\otimes\mathrm{D}]$ -decomposable as, $\forall i\in\{1,2\}$ ,

\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A_{i}}\}\right\rVert_{\mathcal{M}}=4,\quad\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A}\}\right\rVert_{\mathcal{M}}=8.

(49)

On the other hand, we know from [26, Proposition 2] that $A$ is $\nabla$ -indecomposable.

Item 2. We show that, if $(A_{1},A_{2})$ is a $\nabla$ -decomposition of $A$ , then it is also a $[\mathrm{D}\otimes\mathrm{D}]$ -decomposition. We define $m=[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A}\}$ , $m_{1}=[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A_{1}}\}$ , and $m_{2}=[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A_{2}}\}$ . It follows from Proposition 4 that $m$ , $m_{1}$ , and $m_{2}$ are discrete measures with finitely many atoms, such that

m_{i}=\sum_{k=1}^{K_{i}}a_{i,k}\delta_{b_{i,k}}\otimes\delta_{c_{i,k}}\quad\text{with}\quad a_{i,k}\in\{-1,1\}.

(50)

Assume, by contradiction, that $(A_{1},A_{2})$ fails to be a $[\mathrm{D}\otimes\mathrm{D}]$ -decomposition. Then, there exists $(k,k^{\prime})$ such that

(b_{1,k},c_{1,k})=(b_{2,k^{\prime}},c_{2,k^{\prime}}),\quad\text{and}\quad a_{1,k}=-a_{2,k^{\prime}}.

(51)

This means that $(b_{1,k},c_{1,k})$ (respectively $(b_{2,k^{\prime}},c_{2,k^{\prime}})$ ) is a corner of $A_{1}$ (respectively $A_{2}$ ). Observe that, for $(A_{1},A_{2})$ to be a $\nabla$ -decomposition of $A$ , their edges may overlap only on corners, and not on segments. This implies that both corners are facing each other, in which case $a_{1,k}$ and $a_{2,k^{\prime}}$ must have the same value. This is a contradiction.

∎

Indicator functions $\mathbbm{1}_{A}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ are not only useful for the indecomposable structure, but also for their approximation properties. We conclude this section with Lemma 1, which states that any function $f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ can be approximated, in an appropriate sense, by linear combinations of indicator functions. The proof is given in Appendix -A.

Lemma 1.

For $f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ , there exists a sequence $(f_{n})_{n=1}^{\infty}$ of functions $f_{n}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ , such that $[\mathrm{D}\otimes\mathrm{D}]\{f_{n}\}$ is a discrete measure supported on $\frac{1}{n}[1\cdots n]\times\frac{1}{n}[1\cdots n],$ and such that

1.

it is $\mathrm{weak}^{\star}$ -convergent to $f$ in $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ ;

over the mixed derivative,

\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}\geq\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f_{n}\}\right\rVert_{\mathcal{M}}

and

\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}=\underset{n\to\infty}{\mathrm{lim}}\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f_{n}\}\right\rVert_{\mathcal{M}};

3.

it is convergent to $f$ in $\mathcal{L}_{1}(\mathrm{K})$ ;
4.

over the gradient, $\left\lVert\nabla\{f\}\right\rVert_{\mathcal{M}^{2}}=\underset{n\to\infty}{\mathrm{lim}}\left\lVert\nabla\{f_{n}\}\right\rVert_{\mathcal{M}^{2}}$ .

II-C Coarea and Cocorner Formulas

In the sequel, we will use the (convex) cone $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})$ of positive functions, defined as

\big\{f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K}):\forall\phi\in\mathcal{C}_{c}^{\infty}(\mathbb{R}^{2}),\ \phi\geq 0\Rightarrow\langle f,\phi\rangle\geq 0\big\},

(52)

which is closed in both the strong and the $\mathrm{weak}^{\star}$ topologies.

We develop tools to split a function $f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathbb{R}^{2})$ into a sum of indicator functions on indecomposable sets. To this end, we first recall the coarea formula. For $f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathbb{R}^{2})$ and $s\in[0,\infty[$ , we define the functions

P_{-}(f,s)=\left\lVert\nabla\{\min(f,s)\}\right\rVert_{\mathcal{M}^{2}}

(53)

and

P_{+}(f,s)=\left\lVert\nabla\{\max(f-s,0)\}\right\rVert_{\mathcal{M}^{2}}.

(54)

Theorem 2.

The functions $P_{-}(f,\cdot)$ and $P_{+}(f,\cdot)$ are continuous, respectively increasing and decreasing, and satisfy the equalities

P_{-}(f,s)=\int_{0}^{s}\left\lVert\nabla\{\mathbbm{1}_{\{x:f(x)\geq\alpha\}}\}\right\rVert_{\mathcal{M}^{2}}\,\mathrm{d}\alpha,

(55)

P_{+}(f,s)=\int_{s}^{\infty}\left\lVert\nabla\{\mathbbm{1}_{\{x:f(x)\geq\alpha\}}\}\right\rVert_{\mathcal{M}^{2}}\,\mathrm{d}\alpha.

(56)

In addition, they satisfy the $BV$ -coarea formula

\left\lVert\nabla\{f\}\right\rVert_{\mathcal{M}^{2}}=P_{-}(f,s)+P_{+}(f,s)\\ =\int_{0}^{\infty}\left\lVert\nabla\{\mathbbm{1}_{\{x:f(x)\geq\alpha\}}\}\right\rVert_{\mathcal{M}^{2}}\,\mathrm{d}\alpha.

(57)

The coarea formula, for the $\ell_{2}$ norm on the gradient, was proved for Lipschitz functions in [58], for BV functions in [59] and was later extended for general norms (for example, see [60]). Equations (55) and (56) follow from the application of the coarea formula on $\text{min}(f,s)$ and $\text{max}(f-s,0)$ .

Next, we provide a new formula inspired by the coarea one. We define the functions $C_{-}(f,\cdot)$ and $C_{+}(f,\cdot)$ as

C_{-}(f,s)=\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathrm{min}(f,s)\}\right\rVert_{\mathcal{M}},

(58)

C_{+}(f,s)=\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathrm{max}(f-s,0)\}\right\rVert_{\mathcal{M}}.

(59)

In Lemma 2, we adapt Theorem 2 for the operator $\mathrm{D}\otimes\mathrm{D}$ and for piecewise constant functions.

Lemma 2.

Let $f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})$ be such that $[\mathrm{D}\otimes\mathrm{D}]\{f\}$ is a discrete measure with finitely many atoms. Then, the functions $C_{-}(f,\cdot)$ and $C_{+}(f,\cdot)$ are continuous, respectively increasing and decreasing, and satisfy the equalities

C_{-}(f,s)=\int_{0}^{s}\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{\{x:f(x)\geq\alpha\}}\}\right\rVert_{\mathcal{M}}\mathrm{d}\alpha,

(60)

C_{+}(f,s)=\int_{s}^{\infty}\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{\{x:f(x)\geq\alpha\}}\}\right\rVert_{\mathcal{M}}\mathrm{d}\alpha.

(61)

In addition, they verify the cocorner formula

	$\displaystyle\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}=$	$\displaystyle\,C_{-}(f,s)+C_{+}(f,s)$
	$\displaystyle=$	$\displaystyle\int_{0}^{\infty}\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{\{x:f(x)\geq\alpha\}}\}\right\rVert_{\mathcal{M}}\mathrm{d}\alpha.$		(62)

Proof of Lemma 2.

We set $[\mathrm{D}\otimes\mathrm{D}]\{f\}=m=\sum_{k=1}^{K}a_{k}\delta_{\mathbf{x}_{k}}$ and define

\tilde{C}_{-}(f,s)=\int_{0}^{s}\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{\{x:f(x)\geq\alpha\}}\}\right\rVert_{\mathcal{M}}\mathrm{d}\alpha.

(63)

Observe that $f$ takes finitely many values $(v_{0},v_{1},\ldots,v_{M})$ with $v_{m}\leq v_{m+1}$ , $v_{0}=0$ and $v_{M+1}=\infty$ . Then, for $\alpha\in]v_{m},v_{m+1}[$ one has that

\{x:f(x)\geq\alpha\}=\{x:f(x)>v_{m}\}.

(64)

For $s\in]v_{m^{\prime}},v_{m^{\prime}+1}[$ with $m^{\prime}\in[0\cdots M-1]$ , we calculate that

	$\displaystyle\tilde{C}_{-}(f,s)=$	$\displaystyle\sum_{m=1}^{m^{\prime}}(v_{m}-v_{m-1})\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{\{x:f(x)>v_{m-1}\}}\}\right\rVert_{\mathcal{M}}$
		$\displaystyle+(s-v_{m^{\prime}})\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{\{x:f(x)>v_{m^{\prime}}\}}\}\right\rVert_{\mathcal{M}}.$		(65)

It follows directly from the representation (II-C) that $\tilde{C}_{-}(f,\cdot)$ is increasing and continuous. Next, we observe that $[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{\{x:f(x)>\alpha\}}\}$ may only be non-zero on $\mathbf{x}_{k}$ . In addition, for $\alpha<\alpha^{\prime}$ , $\mathbf{x}_{k}$ is in the support of both $[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{\{x:f(x)>\alpha^{\prime}\}}\}$ and $[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{\{x:f(x)>\alpha\}}\}$ if and only if it is a corner of $\mathbbm{1}_{\{x:f(x)>\alpha^{\prime}\}}$ and $\mathbbm{1}_{\{x:f(x)>\alpha\}}$ , in which case the Dirac mass centred at $\mathbf{x}_{k}$ must have an amplitude of the same sign because $\{x:f(x)>\alpha^{\prime}\}\subset\{x:f(x)>\alpha\}$ . It follows from this remark and (II-C) that

\tilde{C}_{-}(f,s)=\Bigg\|[\mathrm{D}\otimes\mathrm{D}]\left\{\sum_{m=1}^{m^{\prime}}(v_{m}-v_{m-1})\mathbbm{1}_{\{x:f(x)>v_{m-1}\}}\right\}\\ +[\mathrm{D}\otimes\mathrm{D}]\{(s-v_{m^{\prime}})\mathbbm{1}_{\{x:f(x)>v_{m^{\prime}}\}}\}\Bigg\|_{\mathcal{M}}

(66)

and, consequently, that $\tilde{C}_{-}(f,s)=C_{-}(f,s).$ A similar argumentation would show that $C_{+}(f,\cdot)$ is continuous, decreasing, and that

\displaystyle C_{+}(f,s)=\int_{s}^{\infty}\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{\{x:f(x)\geq\alpha\}}\}\right\rVert_{\mathcal{M}}\mathrm{d}\alpha.

(67)

Finally,

$\displaystyle C_{-}(f,s)+C_{+}(f,s)$	$\displaystyle=\int_{0}^{\infty}\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{\{x:f(x)\geq\alpha\}}\}\right\rVert_{\mathcal{M}}\mathrm{d}\alpha$
	$\displaystyle=\underset{s\to\infty}{\text{lim}}\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\text{min}(f,s)\}\right\rVert_{\mathcal{M}}$
	$\displaystyle=\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}.$	(68)

∎

In Lemma 3 we extend, with a limit argument, Lemma 2 for arbitrary functions in $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K}).$

Lemma 3.

If $f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})$ , then the functions $C_{-}(f,\cdot)$ and $C_{+}(f,\cdot)$ are continuous and respectively increasing and decreasing. In addition, they verify the equality

\forall s>0:\quad\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert=C_{-}(f,s)+C_{+}(f,s).

(69)

Proof of Lemma 3.

Step 1. Consider a sequence of functions $(f_{n})_{n=1}^{\infty}$ converging to $f$ as in Lemma 1. Without loss of generality, we assume that $f_{n}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})$ . Our Claim 1 is that, up to a subsequence,

\underset{n\to\infty}{\text{lim}}C_{-}(f_{n},s)=C_{-}(f,s),

(70)

and

\underset{n\to\infty}{\text{lim}}C_{+}(f_{n},s)=C_{+}(f,s).

(71)

To prove this, we first recall that

\text{min}(f_{n},s)=\frac{f_{n}+s-|f_{n}-s|}{2}.

(72)

Item 2 of Lemma 1 implies that $f_{n}+s$ and $f_{n}-s$ converge in $\mathcal{L}_{1}(\mathrm{K})$ to $f+s$ and $f-s$ , which, together with the triangle inequality, yields the convergence of $|f_{n}-s|$ to $|f-s|$ . Therefore, $\text{min}(f_{n},s)$ converges in $\mathcal{L}_{1}(\mathrm{K})$ to $\text{min}(f,s)$ . In particular,

\forall\psi\in\mathcal{L}_{\infty}([0,1])\otimes\mathcal{L}_{\infty}([0,1])\subset\mathcal{L}_{\infty}([0,1]^{2}):\\ \underset{n\to\infty}{\text{lim}}\langle\text{min}(f_{n},s),\psi\rangle=\langle\text{min}(f,s),\psi\rangle.

(73)

Remark that

\mathrm{D}\{\mathcal{C}_{c}^{\infty}(\mathbb{R})\}|_{[0,1]}\otimes\mathrm{D}\{\mathcal{C}_{c}^{\infty}(\mathbb{R})\}|_{[0,1]}\subset\mathcal{L}_{\infty}([0,1])\otimes\mathcal{L}_{\infty}([0,1])

(74)

and that $\mathrm{D}\{\mathcal{C}_{c}^{\infty}(\mathbb{R})\}\otimes\mathrm{D}\{\mathcal{C}_{c}^{\infty}(\mathbb{R})\}$ is dense in $[\mathrm{D}\otimes\mathrm{D}]\{\mathcal{C}_{0}(\mathbb{R}^{2})\}$ . Furthermore, it follows from Item 2 of Lemma 1 that

	$\displaystyle\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\text{min}(f_{n},s)\}\right\rVert_{\mathcal{M}}$	$\displaystyle\leq\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f_{n}\}\right\rVert_{\mathcal{M}}$
		$\displaystyle\leq\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}.$		(75)

Consequently, the sequence $(\text{min}(f_{n},s))_{n=1}^{\infty}$ is uniformly bounded and, from (74), $\mathrm{weak}^{\star}$ -convergent in $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})$ to $\text{min}(f,s)$ . A similar argument shows that the sequence $(\text{max}(f_{n}-s,0))_{n=1}^{\infty}$ is uniformly bounded and $\mathrm{weak}^{\star}$ -convergent in $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})$ to $\text{max}(f-s,0)$ . It follows that

\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\text{min}(f,s)\}\right\rVert_{\mathcal{M}}\leq\underset{n\to\infty}{\text{liminf }}C_{-}(f_{n},s),

(76)

and that

\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\text{max}(f-s,0)\}\right\rVert_{\mathcal{M}}\leq\underset{n\to\infty}{\text{liminf }}C_{+}(f_{n},s).

(77)

In addition,

	$\displaystyle C_{-}(f_{n},s)+C_{+}(f_{n},s)$	$\displaystyle=\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f_{n}\}\right\rVert_{\mathcal{M}}$		(78)
		$\displaystyle\leq\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}.$		(79)

Finally, if either (76) or (77) has a strict inequality, we conclude that

	$\displaystyle\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}$
$\displaystyle\leq$	$\displaystyle\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\text{min}(f,s)\}\right\rVert_{\mathcal{M}}+\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\text{max}(f-s,0)\}\right\rVert_{\mathcal{M}}$	(80)
$\displaystyle<$	$\displaystyle\underset{n\to\infty}{\text{ liminf }}C_{-}(f_{n},s)+\underset{n\to\infty}{\text{liminf }}C_{+}(f_{n},s)$	(81)
$\displaystyle=$	$\displaystyle\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}},$	(82)

which is a contradiction. In (80) we used the fact that

f=\text{min}(f,s)+\text{max}(f-s,0),

(83)

and the triangular inequality. In (82) we used (79). Consequently, (76) and (77) hold with an equality and, up to a subsequence, (70) and (71) hold.
Step 2. The function $C_{-}(f,\cdot)$ is increasing. Indeed, Lemma 2 yields that, $\forall n\in\mathbb{N}$ and $\forall s_{1}\leq s_{2}$ ,

\displaystyle C_{-}(f_{n},s_{1})\leq C_{-}(f_{n},s_{2})\Rightarrow C_{-}(f,s_{1})\leq C_{-}(f,s_{2}),

(84)

where we used Claim 1 to pass to the limit. The claim that $C_{+}(f,\cdot)$ is decreasing and (69) are proved likewise.
Step 3. We now show that $C_{-}(f,\cdot)$ and $C_{+}(f,\cdot)$ are continuous. Let $(s_{n})_{n=1}^{\infty}$ be a sequence convergent to $s$ . An argument similar to the one in Step 1 yields that (in parallel with (76) and (77))

	$\displaystyle C_{-}(f,s)$	$\displaystyle=\underset{n\to\infty}{\mathrm{liminf}}\quad C_{-}(f,s_{n}),$
	$\displaystyle C_{+}(f,s)$	$\displaystyle=\underset{n\to\infty}{\mathrm{liminf}}\quad C_{+}(f,s_{n}),$		(85)

from which it follows that

$\displaystyle\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}$	$\displaystyle=\underset{n\to\infty}{\mathrm{liminf}}\quad C_{-}(f,s_{n})+C_{+}(f,s_{n})$
	$\displaystyle\leq\underset{n\to\infty}{\mathrm{limsup}}\quad C_{-}(f,s_{n})+C_{+}(f,s_{n})$	(86)
	$\displaystyle=\underset{n\to\infty}{\mathrm{limsup}}\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}$	(87)
	$\displaystyle=\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}.$	(88)

Equation (87) follows from (69), and the inequality in (86) cannot be strict. It follows that

	$\displaystyle C_{-}(f,s)$	$\displaystyle=\underset{n\to\infty}{\mathrm{liminf}}\quad C_{-}(f,s_{n})=\underset{n\to\infty}{\mathrm{limsup}}\quad C_{-}(f,s_{n}),$
	$\displaystyle C_{+}(f,s)$	$\displaystyle=\underset{n\to\infty}{\mathrm{liminf}}\quad C_{+}(f,s_{n})=\underset{n\to\infty}{\mathrm{limsup}}\quad C_{+}(f,s_{n}).$		(89)

∎

II-D Extreme Point Characterization

We are now in a position to state our main result, Theorem 3.

Theorem 3.

The extreme points of the unit ball in $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})$ , equipped with $\left\lVert\cdot\right\rVert_{\theta}$ , are exactly of the form $a\mathbbm{1}_{A}$ with $a=\frac{1}{\left\lVert\mathbbm{1}_{A}\right\rVert_{\theta}}$ and,

•

if $\theta=1,$ then $A\in\mathcal{E}_{\mathrm{D}\otimes\mathrm{D}}$ ;
•

if $\theta\in]0,1[,$ then $A\in\mathcal{E}_{\nabla}.$

Proof of Theorem 3.

Step 1. We show that an extreme point must be of the form $a\mathbbm{1}_{A}$ . Let $f$ be an extreme point and consider the functions

F_{-}(s)=\theta C_{-}(f,s)+(1-\theta)P_{-}(f,s).

(90)

F_{+}(s)=\theta C_{+}(f,s)+(1-\theta)P_{+}(f,s).

(91)

We know that $F_{-}(\cdot)$ and $F_{+}(\cdot)$ are continuous and respectively increasing and decreasing. In addition, they satisfy the equality

1=\left\lVert f\right\rVert_{\theta}=F_{-}(s)+F_{+}(s).

(92)

By continuity, there exists $s_{1}$ such that $F_{-}(s_{1})=F_{+}(s_{1})=0.5$ , with the decomposition

f=\frac{1}{2}\,2\text{min}(f,s_{1})+\frac{1}{2}\,2\text{max}(f-s_{1},0).

(93)

We find that

\left\lVert 2\text{min}(f,s_{1})\right\rVert_{\theta}=2F_{-}(f,s_{1})=1

(94)

and that

\left\lVert 2\text{max}(f-s_{1},0)\right\rVert_{\theta}=2F_{+}(f,s_{1})=1.

(95)

Since $f$ is an extreme point, we must have that

2\text{max}(f-s_{1},0)=2\text{min}(f,s_{1}).

(96)

If $f\geq s_{1}$ , then $f=2s_{1}$ . If $f\leq s_{1}$ , then $f=0.$ These two equations imply that $f$ has to be of the form $a\mathbbm{1}_{A}$ with $a=\frac{1}{\left\lVert\mathbbm{1}_{A}\right\rVert_{\theta}}.$ Finally, if $\theta\in]0,1[,$ we assume by contradiction that $E$ is not $\nabla$ -indecomposable. Then, it is not $[\mathrm{D}\otimes\mathrm{D}]$ -indecomposable and there exist two sets $A_{1},A_{2}$ , such that $E=A_{1}\cup A_{2}$ ,

\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A}\}\right\rVert_{\mathcal{M}}\\ =\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A_{1}}\}\right\rVert_{\mathcal{M}}+\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathbbm{1}_{A_{2}}\}\right\rVert_{\mathcal{M}},

(97)

and

\left\lVert\nabla\{\mathbbm{1}_{A}\}\right\rVert_{\mathcal{M}^{2}}=\left\lVert\nabla\{\mathbbm{1}_{A_{1}}\}\right\rVert_{\mathcal{M}^{2}}+\left\lVert\nabla\{\mathbbm{1}_{A_{2}}\}\right\rVert_{\mathcal{M}^{2}}.

(98)

We set $f_{1}=\frac{\mathbbm{1}_{A_{1}}}{\left\lVert\mathbbm{1}_{A_{1}}\right\rVert_{\theta}}$ , $f_{2}=\frac{\mathbbm{1}_{A_{2}}}{\left\lVert\mathbbm{1}_{A_{2}}\right\rVert_{\theta}}$ and find that

a\mathbbm{1}_{A}=a\left\lVert\mathbbm{1}_{A_{1}}\right\rVert_{\theta}\mathbbm{1}_{A_{1}}+a\left\lVert\mathbbm{1}_{A_{2}}\right\rVert_{\theta}\mathbbm{1}_{A_{2}},

(99)

which is in contradiction with the assumption that $a\mathbbm{1}_{A}$ is an extreme point. The case $\theta=1$ is similar and omitted.
Step 2. We show that $a\mathbbm{1}_{A}$ is an extreme point. Assume by contradiction that it is not. Then, there exists $\lambda\in\,]0,1[$ and two functions $f_{1}\neq f_{2}$ in the unit ball such that

a\mathbbm{1}_{A}=\lambda f_{1}+(1-\lambda)f_{2}.

(100)

Let $E\subset[0,1]^{2}$ be a measurable set. We define

\left\lVert\cdot\right\rVert_{\theta,E}=\theta\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}(E)}+(1-\theta)\left\lVert\nabla\{\cdot\}\right\rVert_{\mathcal{M}(E)^{2}}.

(101)

We must have

\left\lVert a\mathbbm{1}_{A}\right\rVert_{\theta,E}=\lambda\left\lVert f_{1}\right\rVert_{\theta,E}+(1-\lambda)\left\lVert f_{2}\right\rVert_{\theta,E},

(102)

otherwise we would reach the contradiction

$\displaystyle 1$	$\displaystyle=\left\lVert a\mathbbm{1}_{A}\right\rVert_{\theta,E}+\left\lVert a\mathbbm{1}_{E}\right\rVert_{\theta,E^{c}}$
	$\displaystyle<\lambda\left\lVert f_{1}\right\rVert_{\theta,E}+(1-\lambda)\left\lVert f_{2}\right\rVert_{\theta,E}$
	$\displaystyle\quad+\lambda\left\lVert f_{1}\right\rVert_{\theta,E^{c}}+(1-\lambda)\left\lVert f_{2}\right\rVert_{\theta,E^{c}}=1,$	(103)

which is impossible.

If $\theta\in\,]0,1[$ , then (102) implies that $\mathrm{supp}\{\nabla\{f_{i}\}\}\subset\partial A.$ In particular, $\nabla\{f_{i}\}$ must vanish in the interior $\overset{\circ}{A}$ . It follows from the Constancy Theorem [61], [23, Lemma 4.6] that $f_{i}$ is constant on $A$ . Finally, since $a\mathbbm{1}_{A}=\lambda f_{1}+(1-\lambda)f_{2}$ with $f_{1}\geq 0$ and $f_{2}\geq 0$ , it follows that $f_{1}$ and $f_{2}$ must vanish on $\mathbb{R}^{2}\setminus A$ and satisfy $f_{1}=f_{2}=a\mathbbm{1}_{A}$ . This contradicts the assumption that $f_{1}\neq f_{2}$ .

If $\theta=1$ , then $\mathrm{supp}\{[\mathrm{D}\otimes\mathrm{D}]\{f_{1}\}\}$ and $\mathrm{supp}\{[\mathrm{D}\otimes\mathrm{D}]\{f_{2}\}\}$ must be subsets of the finite set of corners of $A$ , denoted by $A_{c}$ . We use $E_{c}$ with subscript $c$ to denote the set of corners $\mathrm{supp}([\mathrm{D}\otimes\mathrm{D}]\{1_{E}\})$ of a set $E$ . The functions $f_{1}$ and $f_{2}$ must be piecewise constant on $A$ and vanish outside. Assume, by contradiction, that $f_{1}$ and $f_{2}$ are not constant on $A$ . Then,

f_{i}=\sum_{k=1}^{K}a_{i,k}\mathbbm{1}_{E_{k}},\quad\text{with}a_{i,k}\neq a_{i,k^{\prime}},

(104)

and, by construction, $a_{2,k}=a-a_{1,k}$ . We claim that there exists a $k\in[1\cdots K]$ , wlog $k=1$ , such that

\begin{cases}A_{c}\subset E_{1,c}\\ \partial A\subset\partial E.\end{cases}

(105)

Assume by contradiction that there exists no such $k$ . Then, there exists a corner $\tilde{x}$ and two indices $(k,k^{\prime})$ , wlog $(k,k^{\prime})=(1,2)$ , such that

\begin{cases}\tilde{x}\in\partial A\\ \tilde{x}\notin A_{c}\\ \tilde{x}\in E_{1,c}\\ \tilde{x}\in E_{2,c}.\end{cases}

(106)

This implies that $\tilde{x}\notin\mathrm{supp}([\mathrm{D}\otimes\mathrm{D}]\{f_{i}\})$ and, therefore, that one must have $a_{i,1}=a_{i,2}$ for the associated Dirac mass to cancel each other. We apply the same argumentation to find an index k, wlog $k=2$ , such that

\begin{cases}\partial E_{1}\setminus\partial A\subset\partial E_{2}\\ E_{1,c}\setminus A_{c}\subset E_{2,c}.\end{cases}

(107)

In particular, $E_{1}$ and $E_{2}$ share a corner $\tilde{x}\notin\mathrm{supp}([\mathrm{D}\otimes\mathrm{D}]\{f_{i}\})$ , which therefore must be cancelled. This happens if and only if $a_{i,1}=a_{i,2}$ , which is a contradiction. Thus, $f_{1}=a_{1,1}E_{1}$ , $f_{2}=a_{2,1}E_{1},$ and $a_{1,1}=a_{2,1}$ . Finally $f_{1}=f_{2}$ , which is a contradiction with the initial assumption.

∎

In Corollary 1, we leverage the characterisation of extreme points from Theorem 3 to analyse the solution set of a data-fidelity term regularised by $\left\lVert\cdot\right\rVert_{\theta}.$

Corollary 1.

Let the loss functional $\mathcal{J}$ be defined as

\mathcal{J}(\cdot)=E(\mathbf{y},\langle\cdot,\bm{\nu}\rangle)+\lambda\left\lVert\cdot\right\rVert_{\theta}

(108)

with

•

$\mathbf{y}\in\mathbb{R}^{M}$ is a fixed-dimensional vector representing the available measurements;
•

the data-fidelity functional $E:\mathbb{R}^{M}\times\mathbb{R}^{M}\to\mathbb{R}^{+}\cup\{\infty\}$ is proper, coercive, continuous, and strictly convex in its second argument;
•

the measurement operator $\langle\cdot,\bm{\nu}\rangle:\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})\to\mathbb{R}^{M}$ is linear and $\mathrm{weak}^{\star}$ continuous.

Then, the solution set

\mathcal{V}=\underset{f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})}{\operatorname{argmin}}\,\mathcal{J}(f)

(109)

is nonempty, convex, and $\mathrm{weak}^{\star}$ -compact. Moreover, its extreme points are of the form

\sum_{k=1}^{K}a_{k}\mathbbm{1}_{E_{k}},\quad\text{with}\quad K\leq M,

(110)

where $E_{k}\in\mathcal{E}_{\nabla}$ if $\theta\in]0,1[$ , and $E_{k}\in\mathcal{E}_{\mathrm{D}\otimes\mathrm{D}}$ if $\theta=1$ .

Proof of Corollary 1.

Define $\mathcal{J}^{\star}=\underset{f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})}{\text{inf}}\mathcal{J}(f)$ and consider a sequence of solutions $(f_{n})_{n=1}^{\infty}$ such that

-\infty<\underset{n\to\infty}{\text{lim}}\mathcal{J}(f_{n})=\mathcal{J}^{\star}<\infty,

(111)

which exists because $\mathcal{J}$ is coercive and proper. It follows that $(f_{n})_{n=1}^{\infty}$ is bounded in $\left\lVert\cdot\right\rVert_{\theta}$ and, by norm equivalence, bounded in $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}}$ . According to the Banach–Alaoglu theorem, up to a subsequence, $(f_{n})_{n=1}^{\infty}$ is $\mathrm{weak}^{\star}$ -convergent to some limit $f$ . It follows from the $\mathrm{weak}^{\star}$ -lower semicontinuity of $\mathcal{J}$ that

\underset{n\to\infty}{\text{lim}}\mathcal{J}(f_{n})=\mathcal{J}(f)=\mathcal{J}^{\star}.

(112)

Consequently, $\mathcal{V}$ is nonempty, and the characterisation of its extreme points follows from [22, Corollary 3.8] and Theorem 3. ∎

The norm $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}}$ penalises the number of corners of the sets $E_{k}$ , whereas $\left\lVert\nabla\{\cdot\}\right\rVert_{\mathcal{M}^{2}}$ penalises their perimeter. Each has a distinct regularisation effect on the solutions of the optimisation problem. Informally, the former promotes solutions with simpler sets $E_{k}$ , while the latter favours solutions with smaller sets $E_{k}$ . This effect is proportional to $\theta$ , which, in practice, must be tuned.

We conclude this section with Proposition 6, which identifies an important class of $\mathrm{weak}^{\star}$ -continuous measurement operators $\langle\cdot,\bm{\nu}\rangle$ .

Proposition 6.

If $\bm{\nu}=(\nu_{m})_{m=1}^{M}$ is such that $\nu_{m}\in\mathcal{L}_{\infty}(\mathrm{K})$ , then the operator

\langle\cdot,\bm{\nu}\rangle:\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})\to\mathbb{R}^{M},\quad f\mapsto(\langle f,\nu_{m}\rangle)_{m=1}^{M}

(113)

is $\mathrm{weak}^{\star}$ -continuous.

Proof of Proposition 6.

From [32] we have

\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})\subset\mathcal{M}_{\mathrm{D}\otimes\mathrm{D}}(\mathrm{K})\subset\mathcal{M}_{\mathrm{D}\otimes\mathrm{D}}(\mathbb{R}^{2}).

(114)

The space $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D}}(\mathbb{R}^{2})$ , which is essentially the space of distributions $f\in\mathcal{D}^{\prime}(\mathbb{R}^{2})$ such that $[\mathrm{D}\otimes\mathrm{D}]\{f\}\in\mathcal{M}(\mathbb{R}^{2})$ , is the dual space of $\mathcal{C}_{\mathrm{D}\otimes\mathrm{D}}(\mathbb{R}^{2})$ . In particular, it is shown in [32] that $\mathcal{L}_{\infty}(\mathrm{K})\subset\mathcal{C}_{\mathrm{D}\otimes\mathrm{D}}(\mathbb{R}^{2})$ . This implies that if $\bm{\nu}=(\nu_{m})_{m=1}^{M}$ is such that $\nu_{m}\in\mathcal{L}_{\infty}(\mathrm{K})$ , then the operator

\langle\cdot,\bm{\nu}\rangle:\mathcal{M}_{\mathrm{D}\otimes\mathrm{D}}(\mathbb{R}^{2})\to\mathbb{R}^{M},\quad f\mapsto(\langle f,\nu_{m}\rangle)_{m=1}^{M}

(115)

is $\mathrm{weak}^{\star}$ -continuous. This, together with (114), completes the proof. ∎

III Resolution of the Optimization Problem

The resolution of the problem in (109) is not directly feasible, as the underlying search space $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})$ is infinite-dimensional. Moreover, the fact that the edges and corners of $E_{k}$ in (110) may lie anywhere in $\mathrm{K}$ makes their estimation challenging.

III-A Pixel-Based Exact Discretization

We discretize the search space into a finite-dimensional cone, of functions of the form (110), with corners that lie on the dyadic grid

\mathcal{X}_{n}=2^{-n}[0\cdots 2^{n}-1]\times 2^{-n}[0\cdots 2^{n}-1],

(116)

with $(2^{n}+1)^{2}$ knots, and associated pixels

E^{n}_{n_{1},n_{2}}=\left[\frac{(n_{1}-1)}{2^{n}},\frac{n_{1}}{2^{n}}\right]\times\left[\frac{(n_{2}-1)}{2^{n}},\frac{n_{2}}{2^{n}}\right].

(117)

The discretized search space $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{n})$ is

\left\{f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K}):\quad[\mathrm{D}\otimes\mathrm{D}]\{f\}\in\mathcal{M}(\mathcal{X}_{n})\right\},

(118)

and the associated discretized optimization problem is as in

\mathcal{V}_{n}=\underset{f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{n})}{\text{argmin}}\left(E(\mathbf{y},\langle f,\bm{\nu}\rangle)+\lambda\left\lVert f\right\rVert_{\theta}\right).

(119)

Our objective is now to reformulate (119) as an optimisation problem over $\mathbb{R}^{2^{n}\times 2^{n}}_{+}$ , where $\mathbb{R}_{+}=[0,\infty[$ . We note that any function $f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{n})$ can be uniquely expressed as

f=\sum_{n_{1},n_{2}=1,1}^{2^{n},2^{n}}a_{n_{1},n_{2}}\mathbbm{1}_{E_{n_{1},n_{2}}^{n}},\\ \mathbf{a}=(a_{n_{1},n_{2}})_{n_{1},n_{2}=1,1}^{2^{n},2^{n}}\in\mathbb{R}^{2^{n}\times 2^{n}}_{+},

(120)

which corresponds to the tensor product of B-splines of order 1. Furthermore, we introduce the synthesis operator $\mathrm{T}$

\mathrm{T}:\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{n})\to\mathbb{R}_{+}^{2^{n}\times 2^{n}},\\ f\mapsto\left(2^{2n}\langle f,\mathbbm{1}_{E^{n}_{n_{1},n_{2}}}\rangle\right)_{n_{1},n_{2}=1,1}^{2^{n},2^{n}},

(121)

and its adjoint, the analysis operator $\mathrm{T}^{\star}$

\mathrm{T}^{\star}:\mathbb{R}_{+}^{2^{n}\times 2^{n}}\to\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{n}),\\ \mathbf{a}\mapsto\sum_{n_{1},n_{2}=1,1}^{2^{n},2^{n}}a_{n_{1},n_{2}}\mathbbm{1}_{E_{n_{1},n_{2}}^{n}}.

(122)

The operators $\mathrm{T}$ and $\mathrm{T}^{\star}$ are implicitly dependent on the grid $\mathcal{X}_{n}$ . In the sequel, we shall occasionally regard $\mathbf{a}$ as a sequence indexed over $\mathbb{Z}^{2}$ , in which case its entries will be denoted by $\mathbf{a}[\cdot,\cdot]$ and taken to be equal to $0$ outside $[0\cdots 2^{n}]\times[0\cdots 2^{n}]$ . We consider the new optimisation problem

\tilde{\mathcal{V}}_{n}=\underset{\mathbf{a}\in\mathbb{R}^{2^{n}\times 2^{n}}_{+}}{\mathrm{argmin}}\left(E(\mathbf{y},\langle\mathrm{T}^{\star}\{\mathbf{a}\},\bm{\nu}\rangle)+\lambda\left\lVert\mathbf{a}\right\rVert_{\theta}\right),

(123)

where

\left\lVert\mathbf{a}\right\rVert_{\theta}=(1-\theta)2^{-n}\left(\left\lVert\bm{h}_{1,0}\ast\mathbf{a}\right\rVert_{\ell_{1}}+\left\lVert\bm{h}_{0,1}\ast\mathbf{a}\right\rVert_{\ell_{1}}\right)\\ +\theta\left\lVert\bm{h}_{1,1}\ast\mathbf{a}\right\rVert_{\ell_{1}},

(124)

with

\bm{h}_{1,1}=\begin{bmatrix}1&-1\\ -1&1\end{bmatrix},\quad\bm{h}_{1,0}=\begin{bmatrix}1\\ -1\end{bmatrix},\quad\bm{h}_{0,1}=\begin{bmatrix}1&-1\\ \end{bmatrix}.

(125)

A key feature of this formulation, as shown in Proposition 7, is that the problem in (123) is the exact discretisation of the problem in (119), i.e., it incurs no discretisation error.

Proposition 7.

The optimization problem in (123) is equivalent to the one in (119), in the sense that,

\displaystyle f\in\mathcal{V}_{n}

\displaystyle\Leftrightarrow\mathrm{T}\{f\}\in\tilde{\mathcal{V}}_{n}.

(126)

Proof of Proposition 7.

Let $\mathbf{a}=\mathrm{T}\{f\}$ . Observing that $\mathrm{T}^{\star}$ is the inverse of $\mathrm{T}$ , it suffices to prove that

$\displaystyle\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}$	$\displaystyle=\left\lVert\bm{h}_{1,1}\ast\mathbf{a}\right\rVert_{\ell_{1}},$	(127)
$\displaystyle\left\lVert[\mathrm{I}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}$	$\displaystyle=2^{-n}\left\lVert\bm{h}_{0,1}\ast\mathbf{a}\right\rVert_{\ell_{1}},$	(128)
$\displaystyle\left\lVert[\mathrm{D}\otimes\mathrm{I}]\{f\}\right\rVert_{\mathcal{M}}$	$\displaystyle=2^{-n}\left\lVert\bm{h}_{1,0}\ast\mathbf{a}\right\rVert_{\ell_{1}}.$	(129)

We prove only (127), as the reasoning for the other two is analogous. We observe that

\displaystyle\mathbbm{1}_{E_{n_{1},n_{2}}^{n}}=\beta\left(2^{n}\cdot-(n_{1}-1)\right)\otimes\beta\left(2^{n}\cdot-(n_{2}-1)\right),

(130)

and calculate that

	$\displaystyle[\mathrm{D}\otimes\mathrm{D}]\{f\}$
$\displaystyle=$	$\displaystyle\sum_{n_{1},n_{2}=1,1}^{2^{n},2^{n}}\mathbf{a}[n_{1},n_{2}]\left(\delta_{\frac{n_{1}}{2^{n}}}-\delta_{\frac{(n_{1}-1)}{2^{n}}}\right)\otimes\left(\delta_{\frac{n_{2}}{2^{n}}}-\delta_{\frac{(n_{2}-1)}{2^{n}}}\right)$
$\displaystyle=$	$\displaystyle\sum_{(n_{1},n_{2})\in\mathbb{Z}^{2}}\mathbf{a}[n_{1},n_{2}]\left(\delta_{\frac{n_{1}}{2^{n}}}-\delta_{\frac{(n_{1}-1)}{2^{n}}}\right)\otimes\left(\delta_{\frac{n_{2}}{2^{n}}}-\delta_{\frac{(n_{2}-1)}{2^{n}}}\right)$
$\displaystyle=$	$\displaystyle\sum_{(n_{1},n_{2})\in\mathbb{Z}^{2}}\delta_{\frac{n_{1}}{2^{n}},\frac{n_{2}}{2^{n}}}\bigg(\mathbf{a}[n_{1},n_{2}]-\mathbf{a}[n_{1}+1,n_{2}]$
	$\displaystyle-\mathbf{a}[n_{1},n_{2}+1]+\mathbf{a}[n_{1}+1,n_{2}+1]\bigg).$	(131)

Thus

\displaystyle\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}=\sum_{(n_{1},n_{2})\in\mathbb{Z}^{2}}\big|(\bm{h}_{1,1}\ast\mathbf{a})[n_{1},n_{2}]\big|.

(132)

∎

Proposition 7 enables one to obtain a solution $f_{n}^{\star}\in\mathcal{V}_{n}$ by solving the problem in (123). Moreover, since the operator

\langle\mathrm{T}^{\star}\{\cdot\},\bm{\nu}\rangle:\mathbb{R}^{2^{n}\times 2^{n}}_{+}\to\mathbb{R}^{M}

(133)

is linear, when $\mathbf{a}$ is vectorised, the operator in (133) can be represented by a matrix. Consequently, (123) can be solved using a primal–dual algorithm [16] or a double-staged FISTA [62].

III-B Convergence

The solution $f_{n}^{\star}\in\mathcal{V}_{n}$ obtained by solving the discretized optimisation problem (119) is only an approximation of some $f^{\star}\in\mathcal{V}$ . Therefore, for this exact grid-based discretisation to be reliable and practical, one must be confident that, in some sense,

\underset{n\to\infty}{\lim}f_{n}^{\star}=f^{\star}.

(134)

This convergence is established in Theorem 4.

Theorem 4.

For any sequence $(f_{n}^{\star})_{n=1}^{\infty}$ of discretized solutions with $f_{n}^{\star}\in\mathcal{V}_{n},$ and for any solution $f^{\star}\in\mathcal{V}$ , the following convergences hold:

1.

over the loss functional, $\underset{n\to\infty}{\mathrm{lim}}\mathcal{J}(f_{n}^{\star})=\mathcal{J}(f^{\star})$ ;
2.

over the simulated measurements, $\underset{n\to\infty}{\mathrm{lim}}\langle f_{n}^{\star},\bm{\nu}\rangle=\langle f^{\star},\bm{\nu}\rangle$ .

Proof of Theorem 4.

Item 1. We fix $f^{\star}\in\mathcal{V}$ and use Lemma 1 to find a sequence $(\tilde{f}_{n})_{n=1}^{\infty}$ with $\tilde{f}_{n}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{n})$ that is $\mathrm{weak}^{\star}$ -convergent to $f^{\star}$ and satisfies $\left\lVert f^{\star}\right\rVert_{\theta}=\underset{n\to\infty}{\lim}\left\lVert\tilde{f}_{n}\right\rVert_{\theta}$ . Then, the $\mathrm{weak}^{\star}$ -convergence implies that

\underset{n\to\infty}{\lim}E(\mathbf{y},\langle\tilde{f}_{n},\bm{\nu}\rangle)=E(\mathbf{y},\langle f^{\star},\bm{\nu}\rangle)\\ \Rightarrow\underset{n\to\infty}{\lim}\mathcal{J}(\tilde{f}_{n})=\mathcal{J}(f^{\star}).

(135)

It follows from (135) that

\mathcal{J}(f^{\star})\leq\mathcal{J}(f_{n}^{\star})\leq\mathcal{J}(\tilde{f}_{n})\Rightarrow\underset{n\to\infty}{\lim}\mathcal{J}(f_{n}^{\star})=\mathcal{J}(f^{\star}).

(136)

Item 2. The convergence in Item 1 implies that the sequence $(\mathcal{J}(f_{n}^{\star}))_{n=1}^{\infty}$ is bounded. In turn, this boundedness, combined with the assumptions on $E$ , implies that the sequence $(f_{n}^{\star})_{n=1}^{\infty}$ is bounded in $\left\lVert\cdot\right\rVert_{\theta}$ , and therefore in $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}}$ . By the Banach-Alaoglu theorem, we extract a $\mathrm{weak}^{\star}$ -convergent subsequence $(f_{n_{m}}^{\star})_{m=1}^{\infty}$ and find that

\underset{m\to\infty}{\lim}\langle f_{n_{m}}^{\star},\bm{\nu}\rangle=\langle f^{\star\star},\bm{\nu}\rangle

(137)

for some $f^{\star\star}\in\mathcal{V}$ . If one assumes by contradiction that the sequence $(\langle f_{n},\bm{\nu}\rangle)_{n=1}^{\infty}$ does not converge to $\langle f^{\star\star},\bm{\nu}\rangle$ , then a classical argument [30, Theorem 3], which leverages the strict convexity of $E(\mathbf{y},\cdot)$ , leads to a contradiction. ∎

IV Application to Denoising

IV-A General Description

Our goal is to denoise an image $\mathbf{y}$ , which is assumed to be square and composed of pixels

E_{n_{1},n_{2}}^{N}=\left[\frac{n_{1}-1}{2^{N}},\frac{n_{1}}{2^{N}}\right]\times\left[\frac{n_{2}-1}{2^{N}},\frac{n_{2}}{2^{N}}\right]

(138)

at resolution $N$ . The continuous-domain optimization problem is formulated as

\mathcal{V}=\underset{f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})}{\mathrm{argmin}}\left(\left\lVert\mathbf{y}-\bm{\nu}\{f\}\right\rVert_{2}^{2}+\lambda\left\lVert f\right\rVert_{\theta}\right),

(139)

where $\mathbf{y}\in\mathbb{R}^{2^{N}\times 2^{N}}$ , and $\forall\mathbf{n}\in[1\cdots 2^{N}]^{2}$ ,

[\bm{\nu}\{f\}]_{\mathbf{n}}=\langle f,\mathbbm{1}_{E_{\mathbf{n}}^{N}}\rangle=\int_{E_{\mathbf{n}}^{N}}f(\mathbf{x})\mathrm{d}\mathbf{x}.

(140)

We know from Corollary 1 that $\mathcal{V}$ is nonempty, compact, and convex, and that its extreme points are of the form $\sum_{k=1}^{K}a_{k}\mathbbm{1}_{E_{k}}$ with $K\leq 2^{2N}$ . The discretized optimization problem is defined as

\mathcal{V}_{n}=\underset{f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{n})}{\mathrm{argmin}}\left(\left\lVert\mathbf{y}-\bm{\nu}\{f\}\right\rVert_{2}^{2}+\lambda\left\lVert f\right\rVert_{\theta}\right),

(141)

and a solution $f^{\star}_{n}\in\mathcal{V}_{n}$ can be obtained by solving (123). Finally, from Theorem 4, $\forall f^{\star}\in\mathcal{V}$ and $\forall\mathbf{n}\in[1\cdots 2^{N}]^{2}$ , we obtain

\underset{n\to\infty}{\lim}\int_{E_{\mathbf{n}}^{N}}f_{n}(\mathbf{x})\,\mathrm{d}\mathbf{x}=\int_{E_{\mathbf{n}}^{N}}f^{\star}(\mathbf{x})\,\mathrm{d}\mathbf{x}.

(142)

IV-B Resolution on Finite Grid

We argue that, to solve the continuous-domain problem in (139), it is sufficient to find the solution $f^{\star}_{N}\in\mathcal{V}_{N}$ . As a first step, we show in Theorem 5 that the sets of solutions $(\mathcal{V}_{n})_{n=N}^{\infty}$ are embedded one into another.

Theorem 5.

For all $N_{2}\geq N_{1}\geq N$ , the inclusions hold true

\mathcal{V}_{N_{1}}\subset\mathcal{V}_{N_{2}}\subset\mathcal{V}.

(143)

Proof of Theorem 5.

We define the downsampling operator

	$\displaystyle\mathrm{R}:\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{n})$	$\displaystyle\to\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{n-1})$
	$\displaystyle\sum_{\mathbf{n}=1,1}^{2^{n},2^{n}}\mathbf{a}[\mathbf{n}]\mathbbm{1}_{E_{\mathbf{n}}^{n}}$	$\displaystyle\mapsto\sum_{\mathbf{n}=1,1}^{2^{n-1},2^{n-1}}\mathbf{b}[\mathbf{n}]\mathbbm{1}_{E_{\mathbf{n}}^{(n-1)}}$		(144)

with $\mathbf{b}[\mathbf{n}]=\mathbf{b}[n_{1},n_{2}]$ equal to

\frac{1}{4}\big(\mathbf{a}[2n_{1}-1,2n_{2}-1]+\mathbf{a}[2n_{1}-1,2n_{2}]\\ +\mathbf{a}[2n_{1},2n_{2}-1]+\mathbf{a}[2n_{1},2n_{2}]\big).

(145)

Step 1. We claim that, $\forall f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{N_{1}})$ with $N_{1}>N$ ,

\mathcal{J}(\mathrm{R}\{f\})\leq\mathcal{J}(f).

(146)

A straightforward calculation yields that $\langle f,\mathbbm{1}_{E_{\mathbf{n}}^{N}}\rangle$ is equal to $\langle\mathrm{R}\{f\},\mathbbm{1}_{E_{\mathbf{n}}^{N}}\rangle$ and, consequently, that

\left\lVert\mathbf{y}-\bm{\nu}\{f\}\right\rVert_{2}^{2}=\left\lVert\mathbf{y}-\bm{\nu}\{\mathrm{R}\{f\}\}\right\rVert_{2}^{2}.

(147)

Next, we get from (127) that

	$\displaystyle\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}$	$\displaystyle=\sum_{\mathbf{n}\in\mathbb{Z}^{2}}\|(\bm{h}_{1,1}\ast\mathbf{a})[\mathbf{n}]\|$
	$\displaystyle\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathrm{R}\{f\}\}\right\rVert_{\mathcal{M}}$	$\displaystyle=\sum_{\mathbf{n}\in\mathbb{Z}^{2}}\|(\bm{h}_{1,1}\ast\mathbf{b})[\mathbf{n}]\|.$		(148)

The injection of (145) in (148) and the use of several triangle inequalities (9 of them) yield that

$\displaystyle\|\bm{h}_{1,1}\ast\mathbf{b}\|[n_{1},n_{2}]\leq$	$\displaystyle\frac{1}{4}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}-1,2n_{2}-1]\|$
$\displaystyle+$	$\displaystyle\frac{1}{2}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}-1,2n_{2}]\|$
$\displaystyle+$	$\displaystyle\frac{1}{4}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}-1,2n_{2}+1]\|$
$\displaystyle+$	$\displaystyle\frac{1}{2}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1},2n_{2}-1]\|$
$\displaystyle+$	$\displaystyle\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1},2n_{2}]\|$
$\displaystyle+$	$\displaystyle\frac{1}{2}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1},2n_{2}+1]\|$
$\displaystyle+$	$\displaystyle\frac{1}{4}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}+1,2n_{2}-1]\|$
$\displaystyle+$	$\displaystyle\frac{1}{2}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}+1,2n_{2}]\|$
$\displaystyle+$	$\displaystyle\frac{1}{4}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}+1,2n_{2}+1]\|.$	(149)

Finally, the sum of (149) over $(n_{1},n_{2})\in\mathbb{Z}^{2}$ yields that

\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\mathrm{R}\{f\}\}\right\rVert_{\mathcal{M}}\leq\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}},

(150)

and a similar calculation would show that

\left\lVert\nabla\{\mathrm{R}\{f\}\}\right\rVert_{\mathcal{M}^{2}}\leq\left\lVert\nabla\{f\}\right\rVert_{\mathcal{M}^{2}}.

(151)

The claim (146) follows from the combination of (147), (150), and (151).
Step 2. Let $f_{N_{1}}\in\mathcal{V}_{N_{1}}$ and $f_{N_{2}}\in\mathcal{V}_{N_{2}}$ . One can iterate Step 1 to find that

\mathcal{J}(f_{N_{1}})\leq\mathcal{J}(\mathrm{R}^{N_{2}-N_{1}}\{f_{N_{2}}\})\leq\mathcal{J}(f_{N_{2}}),

(152)

where the LHS inequality follows from the inclusion $\mathrm{R}^{N_{2}-N_{1}}\{f_{N_{2}}\}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{N_{1}})$ and the RHS inequality follows from (146). In addition, it follows from the inclusion

\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{N_{1}})\subset\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{N_{2}})

(153)

that $\mathcal{J}(f_{N_{2}})\leq\mathcal{J}(f_{N_{1}})$ , which, combined with (152), yields that $\mathcal{J}(f_{N_{2}})=\mathcal{J}(f_{N_{1}})$ . This proves the inclusion $\mathcal{V}_{N_{1}}\subset\mathcal{V}_{N_{2}}.$ Finally, if $(f_{n}^{\star})_{n=1}^{\infty}$ is a sequence of solutions with $f_{n}\in\mathcal{V}_{n}$ , then for some $f^{\star}\in\mathcal{V}$ ,

\underset{n\to\infty}{\text{lim}}\mathcal{J}(f_{n}^{\star})=\mathcal{J}(f^{\star}).

(154)

One can iterate (146) to find that, $\forall n\geq N_{2}$ ,

\mathcal{J}(f^{\star})\leq\mathcal{J}(f_{N_{2}})\leq\mathcal{J}(\mathrm{R}^{n-N_{2}}\{f_{n}\})\leq\mathcal{J}(f_{n}),

(155)

The combination of Equation (155) and (154) yields the inclusion $\mathcal{V}_{N_{2}}\subset\mathcal{V}.$ ∎

Theorem 5 reveals that a solution of the continuous-domain problem can be found by solving the optimization problem on $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{N})$ . The optimization over $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{N})$ has a unique solution.

Proposition 8.

There is a unique solution $f^{\star}\in\mathcal{V}$ whose knots lie on the grid $\mathcal{X}_{N}$ , i.e.,

\mathcal{V}_{N}=\{f^{\star}\}.

(156)

Proof of Proposition 8.

We know from Theorem (4) that there is a unique value $\mathbf{y}^{\star}\in\mathbb{R}^{2^{N}\times 2^{N}}$ such that, $\forall f^{\star}\in\mathcal{V}_{N}\subset\mathcal{V}$ ,

\mathbf{y}^{\star}=\bm{\nu}\{f^{\star}\}.

(157)

We conclude with the observation that, on $\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathcal{X}_{N})$ , the measurement operator $\bm{\nu}$ is a bijection. ∎

The combination of Theorem 5 and Proposition 8 yields that (123), specialised to the denoising task, has a unique solution which also happens to be a solution (through the synthesis operator) of the continuous-domain optimization problem.

IV-C Numerical Experiments

We fix the data fidelity as the squared difference and study the optimization problem in

\mathcal{V}_{\lambda,\theta}=\underset{f\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}^{+}(\mathrm{K})}{\text{argmin}}\left(\frac{1}{2}\left\lVert\mathbf{y}-\bm{\nu}\{f\}\right\rVert_{2}^{2}+\lambda\left\lVert f\right\rVert_{\theta}\right),

(158)

where $\bm{\nu}$ is defined in (142) and $\mathbf{y}\in\mathbb{R}^{2^{N}\times 2^{N}}$ is the image to denoise. We discretize this optimization problem, as in (141), on the canonical pixel basis $\mathcal{X}_{N}$ and denote by $f_{\lambda,\theta}^{\star}\in\mathcal{V}_{\lambda,\theta}$ its unique solution (Theorem 5 and Proposition 8). Then, we consider the synthesis formulation (123) of the discretized optimization problem in

\{\mathbf{a}_{\lambda,\theta}^{\star}\}=\underset{\mathbf{a}\in\mathbb{R}^{2^{N}\times 2^{N}}_{+}}{\text{argmin}}\left(\frac{1}{2}\left\lVert\mathbf{y}-\mathbf{a}\right\rVert_{2}^{2}+\lambda\left\lVert\bm{h}_{\theta}\ast\mathbf{a}\right\rVert_{1}\right),

(159)

whose regularization term has been reparametrized for convenience, with

\bm{h}_{\theta}=\left[\begin{array}[]{rr}\theta\bm{h}_{1,1}\\ \left(1-\frac{\theta}{2}\right)\bm{h}_{1,0}\\ \left(1-\frac{\theta}{2}\right)\bm{h}_{0,1}\\ \end{array}\right].

(160)

The convolution in (159) is applied element wise on the rows of $\bm{h}_{\theta}.$ Finally, since there is no closed form for the proximal operator of $\lambda\left\lVert\bm{h}_{\theta}\ast\mathbf{a}\right\rVert_{1}$ , we solve the dual problem. The latter is optimised with the FISTA algorithm [12]. In practice, our method extends without any issues to images that are not square. The denoising efficiency of our method, labelled MTV, is assessed on the BSD68 test set alongside two other benchmark methods. The results are summarised in Table I.

	$\sigma=5/255$	$\sigma=15/255$	$\sigma=25/255$
TV	36.41	29.91	27.48
MTV	36.83	30.20	27.72
CRR-NN [63]	36.96	30.55	28.11

TABLE I: Denoising performance on the BSD68 test set. The chosen metric is the PSNR, averaged over the 68 images. The parameters

\lambda

and

\theta

for TV and MTV have been optimised to maximise the PSNR. The method CRR-NN is included for comparison. In this method, the authors trained 24 filters of size

15\times 15

, parametrised by neural networks, on a Gaussian-denoising task.

To investigate the choice of the hyper-parameter $\theta$ , we denote by $\theta^{\star}_{\sigma}$ the value of $\theta$ that maximises the PSNR for a specific image and noise level $\sigma$ , and provide in Table II statistics on the distribution of $\theta^{\star}_{\sigma}$ .

	$\theta^{\star}_{5}$	$\theta^{\star}_{15}$	$\theta^{\star}_{25}$	$\|\theta^{\star}_{15}-\theta^{\star}_{25}\|$
Expectation	0.37	0.33	0.33	0.04
Standard deviation	0.07	0.08	0.12	0.04

TABLE II: Statistics of

\theta^{\star}_{\sigma}

taken over 68 samples, corresponding to the optimal values of

\theta

for the 68 images of the BSD68 test set, for each noise level.

We observe from the expectations in Table II that our method does not reduce to the TV one ( $\theta=0$ ). The optimal convex combination of $\nabla$ -based and $[\mathrm{D}\otimes\mathrm{D}]$ -based regularizations is non-trivial and appears necessary for efficient denoising of natural images. Furthermore, the column $|\theta^{\star}_{15}-\theta^{\star}_{25}|$ shows that the choice of the optimal $\theta$ is stable with respect to $\sigma$ , which, in turn, implies that the optimal $\theta$ depends mostly on the geometry of the image rather than on the noise level.

We observe that images in BSD68 for which MTV significantly outperforms TV, by more than $0.5$ dB, typically feature corner-like structures, such as in Image 1.

Refer to caption — Figure 1: Image 21 in BSD68 dataset. For $\sigma=25$ , the PSNR for MTV and TV are, respectively, $25.96$ and $25.29$ .

To better illustrate this behaviour, we consider a $722\times 1920$ image of New York City, free of rights, as shown in Figure 2. For the noise level $\sigma=25/255$ , the TV and MTV methods yield PSNR values of 24.70 and 25.52, respectively. We observe that bands of constant values and corners are reconstructed better with MTV.

V Conclusion

In this paper, we have shown that the convex combination of the TV regularization with the tensor product term $\left\lVert[\mathrm{D}\otimes\mathrm{D}]\{\cdot\}\right\rVert_{\mathcal{M}}$ promotes piecewise constant functions with edges parallel to the $x_{1}$ or the $x_{2}$ axis, as solutions to a continuous-domain optimization problem. In the particular case of denoising, we revealed that the optimization problem discretized on the canonical, pixel-based grid has a unique solution, which is also a solution of the continuous-domain problem. Finally, we demonstrated that the discretized problem can be solved with the same algorithm as for TV, with an additional filter, and that our new regularization significantly improves the reconstruction quality when the image has the appropriate underlying geometry.

Acknowledgment

Vincent Guillemet was supported by the Swiss National Science Foundation (SNSF) under Grant 200020_219356.

References

[1] M. Bertero, P. Boccacci, and C. De Mol, Introduction to Inverse Problems in Imaging. CRC press, 2021.
[2] M. T. McCann, M. Unser et al., “Biomedical image reconstruction: the foundations to deep neural networks,” Foundations and Trends® in Signal Processing, vol. 13, no. 3, pp. 283–359, 2019.
[3] H. Gupta, J. Fageot, and M. Unser, “Continuous-domain solutions of linear inverse problems with Tikhonov versus generalized TV regularization,” IEEE Transactions on Signal Processing, vol. 66, no. 17, pp. 4670–4684, 2018.
[4] M. Unser and J. Fageot, “Native Banach spaces for splines and variational inverse problems,” arXiv preprint arXiv:1904.10818, 2019.
[5] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259–268, 1992.
[6] L. Ambrosio, N. Fusco, and D. Pallara, Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, 2000.
[7] L. Evans, Measure Theory and Fine Properties of Functions. Routledge, 2018.
[8] W. Rudin, Real and Complex Analysis, 3rd ed. New York: McGraw-Hill, 1987.
[9] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
[10] E. Candes and J. Romberg, “Sparsity and incoherence in compressive sampling,” Inverse Problems, vol. 23, no. 3, p. 969, 2007.
[11] A. M. Bruckstein, D. L. Donoho, and M. Elad, “From sparse solutions of systems of equations to sparse modeling of signals and images,” SIAM Review, vol. 51, no. 1, pp. 34–81, 2009.
[12] A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2419–2434, 2009.
[13] M. Zhu and T. Chan, “An efficient primal-dual hybrid gradient algorithm for total variation image restoration,” Ucla Cam Report, vol. 34, no. 2, 2008.
[14] E. Esser, X. Zhang, and T. F. Chan, “A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science,” SIAM Journal on Imaging Sciences, vol. 3, no. 4, pp. 1015–1046, 2010.
[15] A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” Journal of Mathematical Imaging and Vision, vol. 40, pp. 120–145, 2011.
[16] L. Condat, “A primal–dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms,” Journal of Optimization Theory and Applications, vol. 158, no. 2, pp. 460–479, 2013.
[17] R. Caccioppoli, “Misura e integrazione sugli insiemi dimensionalmente orientati,” Atti Accad. Naz. Lincei. Rend. Cl. Sci. Fis. Mat. Nat.(8), vol. 12, pp. 3–11, 1952.
[18] E. De Giorgi, “Su una teoria generale della misura (r- 1)-dimensionale in uno spazio ad r dimensioni,” Annali di Matematica Pura ed Applicata, vol. 36, pp. 191–213, 1954.
[19] A. Flinth and P. Weiss, “Exact solutions of infinite dimensional total-variation regularized problems,” Information and Inference: A Journal of the IMA, vol. 8, no. 3, pp. 407–443, 2019.
[20] M. Unser, J. Fageot, and J. P. Ward, “Splines are universal solutions of linear inverse problems with generalized TV regularization,” SIAM Review, vol. 59, no. 4, pp. 769–793, 2017.
[21] M. Unser, “A unifying representer theorem for inverse problems and machine learning,” Foundations of Computational Mathematics, vol. 21, no. 4, pp. 941–960, 2021.
[22] C. Boyer, A. Chambolle, Y. D. Castro, V. Duval, F. De Gournay, and P. Weiss, “On representer theorems and convex regularization,” SIAM Journal on Optimization, vol. 29, no. 2, pp. 1260–1281, 2019.
[23] K. Bredies and M. Carioni, “Sparsity of solutions for variational inverse problems with finite-dimensional data,” Calculus of Variations and Partial Differential Equations, vol. 59, no. 1, p. 14, 2020.
[24] W. H. Fleming, “Functions with generalized gradient and generalized surfaces,” Annali di Matematica Pura ed Applicata, vol. 44, no. 1, pp. 93–103, 1957.
[25] ——, “Functions whose partial derivatives are measures,” Illinois Journal of Mathematics, vol. 4, no. 3, pp. 452–478, 1960.
[26] L. Ambrosio, V. Caselles, S. Masnou, and J.-M. Morel, “Connected components of sets of finite perimeter and applications to image processing,” Journal of the European Mathematical Society, vol. 3, no. 1, pp. 39–92, 2001.
[27] M. Unser, “Splines: A perfect fit for signal and image processing,” IEEE Signal Processing Magazine, vol. 16, no. 6, pp. 22–38, 1999.
[28] T. Debarre, J. Fageot, H. Gupta, and M. Unser, “B-spline-based exact discretization of continuous-domain inverse problems with generalized TV regularization,” IEEE Transactions on Information Theory, vol. 65, no. 7, pp. 4457–4470, 2019.
[29] T. Debarre, Q. Denoyelle, and J. Fageot, “On the uniqueness of solutions for the basis pursuit in the continuum,” Inverse Problems, vol. 38, no. 12, p. 125005, 2022.
[30] V. Guillemet, J. Fageot, and M. Unser, “Convergence analysis of the discretization of continuous-domain inverse problems,” Inverse Problems, vol. 41, no. 4, p. 045008, 2025.
[31] J. Fageot, “Variational seasonal-trend decomposition with sparse continuous-domain regularization,” arXiv preprint arXiv:2505.10486, 2025.
[32] V. Guillemet and M. Unser, “Variational tensor product splines,” En Preparation, 2025.
[33] K. Bredies, K. Kunisch, and T. Pock, “Total generalized variation,” SIAM Journal on Imaging Sciences, vol. 3, no. 3, pp. 492–526, 2010.
[34] K. Bredies and M. Holler, “Regularization of linear inverse problems with total generalized variation,” Journal of Inverse and Ill-Posed Problems, vol. 22, no. 6, pp. 871–913, 2014.
[35] F. Demengel, “Fonctions à hessien borné,” in Annales de l’institut Fourier, vol. 34, no. 2, 1984, pp. 155–190.
[36] S. Lefkimmiatis, A. Bourquard, and M. Unser, “Hessian-based norm regularization for image restoration with biomedical applications,” IEEE Transactions on Image Processing, vol. 21, no. 3, pp. 983–995, 2011.
[37] S. Lefkimmiatis, J. P. Ward, and M. Unser, “Hessian Schatten-norm regularization for linear inverse problems,” IEEE Transactions on Image Processing, vol. 22, no. 5, pp. 1873–1888, 2013.
[38] S. Aziznejad, J. Campos, and M. Unser, “Measuring complexity of learning schemes using Hessian-Schatten total variation,” SIAM Journal on Mathematics of Data Science, vol. 5, no. 2, pp. 422–445, 2023.
[39] J. Campos, S. Aziznejad, and M. Unser, “Learning of continuous and piecewise-linear functions with Hessian total-variation regularization,” IEEE Open Journal of Signal Processing, vol. 3, pp. 36–48, 2022.
[40] M. Pourya, A. Goujon, and M. Unser, “Delaunay-triangulation-based learning with Hessian total-variation regularization,” IEEE Open Journal of Signal Processing, vol. 4, pp. 167–178, 2023.
[41] M. Pourya, A. Boquet-Pujadas, and M. Unser, “A box-spline framework for inverse problems with continuous-domain sparsity constraints,” IEEE Transactions on Computational Imaging, 2024.
[42] L. Ambrosio, C. Brena, and S. Conti, “Functions with bounded Hessian–Schatten variation: Density, variational, and extremality properties,” Archive for Rational Mechanics and Analysis, vol. 247, no. 6, p. 111, 2023.
[43] L. Ambrosio, S. Aziznejad, C. Brena, and M. Unser, “Linear inverse problems with Hessian–Schatten total variation,” Calculus of Variations and Partial Differential Equations, vol. 63, no. 1, p. 9, 2024.
[44] M. Unser and S. Aziznejad, “Convex optimization in sums of Banach spaces,” Applied and Computational Harmonic Analysis, vol. 56, pp. 1–25, 2022.
[45] L. E. Dubins, “On extreme points of convex sets,” Journal of Mathematical Analysis and Applications, vol. 5, no. 2, pp. 237–244, 1962.
[46] R. Parhi and M. Unser, “The sparsity of cycle spinning for wavelet-based solutions of linear inverse problems,” IEEE Signal Processing Letters, vol. 30, pp. 568–572, 2023.
[47] M. Unser and S. Ducotterd, “Universal architectures for the learning of polyhedral norms and convex regularizers,” arXiv preprint arXiv:2503.19190, 2025.
[48] G. Bellettini, V. Caselles, and M. Novaga, “The total variation flow in rn,” Journal of Differential Equations, vol. 184, no. 2, pp. 475–525, 2002.
[49] F. Andreu, C. Ballester, V. Caselles, and J. M. Mazón, “Minimizing total variation flow,” 2001.
[50] F. Andreu, V. Caselles, J. I. Díaz, and J. M. Mazón, “Some qualitative properties for the total variation flow,” Journal of Functional Analysis, vol. 188, no. 2, pp. 516–547, 2002.
[51] V. Caselles, A. Chambolle, and M. Novaga, “Regularity for solutions of the total variation denoising problem,” Revista Matemática Iberoamericana, vol. 27, no. 1, pp. 233–252, 2011.
[52] M.-J. Lai, B. Lucier, and J. Wang, “The convergence of a central-difference discretization of Rudin-Osher-Fatemi model for image denoising,” in Scale Space and Variational Methods in Computer Vision: Second International Conference, SSVM 2009, Voss, Norway, June 1-5, 2009. Proceedings 2. Springer, 2009, pp. 514–526.
[53] S. Bartels, “Total variation minimization with finite elements: convergence and iterative solution,” SIAM Journal on Numerical Analysis, vol. 50, no. 3, pp. 1162–1180, 2012.
[54] J. Wang and B. J. Lucier, “Error bounds for finite-difference methods for Rudin–Osher–Fatemi image smoothing,” SIAM Journal on Numerical Analysis, vol. 49, no. 2, pp. 845–868, 2011.
[55] A. Chambolle and T. Pock, “Approximating the total variation with finite differences or finite elements,” in Handbook of Numerical Analysis. Elsevier, 2021, vol. 22, pp. 383–417.
[56] D. L. Cohn, Measure Theory. Springer, 2013, vol. 2.
[57] O. Kallenberg et al., Random Measures, Theory and Applications. Springer, 2017, vol. 1.
[58] H. Federer, “Curvature measures,” Transactions of the American Mathematical Society, vol. 93, no. 3, pp. 418–491, 1959.
[59] W. H. Fleming and R. Rishel, “An integral formula for total gradient variation,” Archiv der Mathematik, vol. 11, pp. 218–222, 1960.
[60] L. Rotem, “The anisotropic total variation and surface area measures,” in Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2020-2022. Springer, 2023, pp. 297–312.
[61] G. Dolzmann and S. Müller, “Microstructures with finite surface energy: the two-well problem,” Archive for Rational Mechanics and Analysis, vol. 132, pp. 101–141, 1995.
[62] S. Bonettini, S. Rebegoldi, and V. Ruggiero, “Inertial variable metric techniques for the inexact forward–backward algorithm,” SIAM Journal on Scientific Computing, vol. 40, no. 5, pp. A3180–A3210, 2018.
[63] A. Goujon, S. Neumayer, P. Bohra, S. Ducotterd, and M. Unser, “A neural-network-based convex regularizer for inverse problems,” IEEE Transactions on Computational Imaging, vol. 9, pp. 781–795, 2023.

-A Proofs

Proof of Lemma 1.

Construction. Consider the partition $(A_{k}^{n})_{k=1}^{n}$ of $[0,1]$ with

A_{k}^{n}=\begin{cases}\left]\frac{k-1}{n},\frac{k}{n}\right],&k>1,\\ \left[\frac{k-1}{n},\frac{k}{n}\right],&k=1,\end{cases}

(161)

and the partition $(A_{k,k^{\prime}}^{n})_{k,k^{\prime}=1}^{n,n}$ of $\mathrm{K}$ with $A_{k,k^{\prime}}^{n}=A_{k}^{n}\times A_{k^{\prime}}^{n}.$ We define $m=[\mathrm{D}\otimes\mathrm{D}]\{f\}$ and

f_{n}=\sum_{k=1}^{n}\sum_{k^{\prime}=1}^{n}m(A_{k,k^{\prime}}^{n})\,u\left(\cdot-\frac{k}{n}\right)\otimes u\left(\cdot-\frac{k^{\prime}}{n}\right).

(162)

We calculate that

m_{n}=[\mathrm{D}\otimes\mathrm{D}]\{f_{n}\}=\sum_{k=1}^{n}\sum_{k^{\prime}=1}^{n}m(A_{k,k^{\prime}}^{n})\,\delta_{\frac{k}{n}}\otimes\delta_{\frac{k^{\prime}}{n}}\in\mathcal{M}(\mathrm{K}),

(163)

and observe that, by construction, $f_{n}$ is compactly supported in $\mathrm{K}$ . Therefore, $f_{n}\in\mathcal{M}_{\mathrm{D}\otimes\mathrm{D},0}(\mathrm{K})$ .
Item 1. For $[\mathrm{D}\otimes\mathrm{D}]\{\phi\}\in[\mathrm{D}\otimes\mathrm{D}]\{\mathcal{C}_{0}(\mathbb{R}^{2})\}$ , we calculate that

	$\displaystyle\|\langle f-f_{n},[\mathrm{D}\otimes\mathrm{D}]\{\phi\}\rangle\|$
$\displaystyle=$	$\displaystyle\|\langle m-m_{n},\phi\rangle\|$
$\displaystyle\leq\;$	$\displaystyle\sum_{k,k^{\prime}=1,1}^{n,n}\left\|\int_{A_{k,k^{\prime}}^{n}}\phi(\mathbf{x})\,\mathrm{d}m(\mathbf{x})-m(A_{n}^{k})\phi\left(\frac{k}{n},\frac{k^{\prime}}{n}\right)\right\|$
$\displaystyle\leq\;$	$\displaystyle\sum_{k^{\prime},k=1}^{n,n}\int_{A_{k,k^{\prime}}^{n}}\left\|\phi(\mathbf{x})-\phi\left(\frac{k}{n},\frac{k^{\prime}}{n}\right)\right\|\mathrm{d}m(\mathbf{x})$
$\displaystyle\leq\;$	$\displaystyle\left\lVert m\right\rVert_{\mathcal{M}}\underset{\left\lVert\mathbf{x}-\mathbf{t}\right\rVert\leq\frac{\sqrt{2}}{n}}{\text{sup}}\|\phi(\mathbf{x})-\phi(\mathbf{t})\|\underset{n\to\infty}{\to}0,$	(164)

because $\phi$ is uniformly continuous. This shows the $\mathrm{weak}^{\star}$ -convergence.
Item 2. The observation that $\left\lVert m_{n}\right\rVert_{\mathcal{M}}\leq\left\lVert m\right\rVert_{\mathcal{M}}$ and the $\mathrm{weak}^{\star}$ -convergence proved in Item 1 imply that

\left\lVert m\right\rVert_{\mathcal{M}}\leq\underset{n\to\infty}{\text{liminf}}\left\lVert m_{n}\right\rVert_{\mathcal{M}}\leq\underset{n\to\infty}{\text{limsup}}\left\lVert m_{n}\right\rVert_{\mathcal{M}}\leq\left\lVert m\right\rVert_{\mathcal{M}}.

(165)

Item 3. We calculate that

\displaystyle f_{n}(x_{1},x_{2})=\int_{0}^{x_{1}}\int_{0}^{x_{2}}\mathrm{d}m_{n}(t_{1},t_{2})\leq\left\lVert m_{n}\right\rVert_{\mathcal{M}}\leq\left\lVert m\right\rVert_{\mathcal{M}}.

(166)

It follows that the family $\{(f_{n})_{n=1}^{\infty},f\}$ is upper-bounded by $\left\lVert m\right\rVert_{\mathcal{M}}\mathbbm{1}_{\mathrm{K}}$ . In addition, for $(x_{1},x_{2})\in A_{k,k^{\prime}}^{n}$ , we calculate that

	$\displaystyle f(x_{1},x_{2})-f_{n}(x_{1},x_{2})$
$\displaystyle=$	$\displaystyle m([0,x_{1}[\times[0,x_{2}[)-m\left(\left[0,\frac{k-1}{n}\right]\times\left[0,\frac{k^{\prime}-1}{n}\right]\right)$
$\displaystyle=$	$\displaystyle m\left(\left[0,\frac{k-1}{n}\right]\times\left]\frac{k^{\prime}-1}{n},x_{2}\right[\right)$
	$\displaystyle+m\left(\left]\frac{k-1}{n},x_{1}\right[\times\left[0,\frac{k^{\prime}-1}{n}\right]\right)$
	$\displaystyle+m\left(\left]x_{1},\frac{k-1}{n}\right[\times\left]x_{2},\frac{k^{\prime}-1}{n}\right[\right)\underset{n\to\infty}{\to}0.$	(167)

In (-A), we used the observation that taking the measure $m$ on the closed square or the half-open square yields functions that belong to the same equivalence class. The convergence follows from the continuity from above of measures. Consequently, since the sequence $(f_{n})_{n=1}^{\infty}$ is upper-bounded by an $\mathcal{L}_{1}(\mathrm{K})$ function and converges pointwise almost everywhere to $f$ , we conclude by the Lebesgue dominated convergence theorem (LDC) that it converges in $\mathcal{L}_{1}(\mathrm{K})$ to $f$ .
Item 4: We first recall that, for $\mathrm{K}\subset\Omega$ ,

\left\lVert\nabla\{f_{n}\}\right\rVert_{\mathcal{M}^{2}}=\left\lVert\nabla\{f_{n}\}\right\rVert_{\mathcal{M}(\mathrm{K})^{2}}=\left\lVert\nabla\{f_{n}\}\right\rVert_{\mathcal{M}(\Omega)^{2}}.

(168)

Since $f_{n}\in\mathrm{BV}(\Omega)$ , one has from the theory of BV functions that $\left\lVert\nabla\{f_{n}\}\right\rVert_{\mathcal{M}(\Omega)^{2}}$ is equal to

\displaystyle\text{sup}\left\{\int_{\Omega}\nabla\{f_{n}\}(\mathbf{x})\cdot\phi(\mathbf{x})\,\mathrm{d}\mathbf{x}:\phi\in\mathcal{C}_{c}^{1}(\Omega,\mathbb{R}^{2}),\left\lVert\phi\right\rVert\leq 1\right\}.

(169)

For $\phi\in\mathcal{C}_{c}^{1}(\Omega)$ , we calculate that

	$\displaystyle\langle[\mathrm{I}\otimes\mathrm{D}]\{f-f_{n}\},\phi\rangle$	$\displaystyle=\langle(u\otimes\delta_{0})\ast(m-m_{n}),\phi\rangle$
		$\displaystyle=\langle m-m_{n},(u^{\vee}\otimes\delta_{0})\ast\phi\rangle\underset{n\to\infty}{\to}0,$		(170)

because, since $m-m_{n}$ is compactly supported in $\mathrm{K}$ , we can use the same argument as in (164) with $(u^{\vee}\otimes\delta_{0})\ast\phi\in\mathcal{C}(\mathbb{R}^{2})$ . It follows that, $\forall\phi\in\mathcal{C}_{c}^{1}(\Omega,\mathbb{R}^{2})$ ,

\langle\nabla\{f_{n}-f\},\phi\rangle\underset{n\to\infty}{\to}0\Rightarrow\left\lVert\nabla\{f\}\right\rVert_{\mathcal{M}^{2}}\leq\underset{n\to\infty}{\mathrm{liminf}}\left\lVert\nabla\{f_{n}\}\right\rVert_{\mathcal{M}^{2}}.

(171)

To continue, we observe that

\left\lVert[\mathrm{I}\otimes\mathrm{D}]\{f_{n}\}\right\rVert_{\mathcal{M}}=\left\lVert(u\otimes\delta_{0})\ast m_{n}\right\rVert_{\mathcal{M}}=\\ \int_{0}^{1}\left\lVert m_{n}([0,t[\times\{\cdot\})\right\rVert_{\mathcal{M}(\mathbb{R})}\,\mathrm{d}t.

(172)

Equation (172) also holds with $f_{n}$ replaced by $f$ , and $m_{n}$ by $m$ . If $t\in A_{k}^{n},$ then

	$\displaystyle\left\lVert m_{n}([0,t[\times\{\cdot\})\right\rVert_{\mathcal{M}(\mathbb{R})}$
$\displaystyle=$	$\displaystyle\sum_{k^{\prime}=1}^{n}\left\|m_{n}\left([0,t[\times\left\{\frac{k^{\prime}}{n}\right\}\right)\right\|$
$\displaystyle=$	$\displaystyle\sum_{k^{\prime}=1}^{n}\left\|\sum_{\ell=1}^{k-1}m(A_{\ell,k^{\prime}}^{n})\right\|$
$\displaystyle=$	$\displaystyle\sum_{k^{\prime}=1}^{n}\left\|m\left(\left[0,\frac{k-1}{n}\right]\times A_{k^{\prime}}^{n}\right)\right\|$
$\displaystyle\leq$	$\displaystyle\sum_{k^{\prime}=1}^{n}\left\|m\left(\left[0,t\right[\times A_{k^{\prime}}^{n}\right)\right\|+\sum_{k^{\prime}=1}^{n}\left\|m\left(\left]\frac{k-1}{n},t\right[\times A_{k^{\prime}}^{n}\right)\right\|$
$\displaystyle\leq$	$\displaystyle\left\lVert m([0,t[\times\{\cdot\})\right\rVert_{\mathcal{M}(\mathbb{R})}+\left\lVert m\left(\left]\frac{k-1}{n},t\right[\times\{\cdot\}\right)\right\rVert_{\mathcal{M}(\mathbb{R})}.$	(173)

It follows that

\left\lVert[\mathrm{I}\otimes\mathrm{D}]\{f_{n}\}\right\rVert_{\mathcal{M}}\leq\left\lVert[\mathrm{I}\otimes\mathrm{D}]\{f\}\right\rVert_{\mathcal{M}}\\ +\int_{0}^{1}\left\lVert m(]\text{proj}_{n}(t),t[\times\{\cdot\})\right\rVert_{\mathcal{M}(\mathbb{R})}\,\mathrm{d}t,

(174)

where $\text{proj}_{n}(t)$ returns the largest $(k-1)$ such that $\frac{k-1}{n}\leq t$ . We write $T_{n}=]\text{proj}_{n}(t),t[$ and observe that a similar argumentation will show that

\left\lVert[\mathrm{D}\otimes\mathrm{I}]\{f_{n}\}\right\rVert_{\mathcal{M}}\leq\left\lVert[\mathrm{D}\otimes\mathrm{I}\{f\}\right\rVert_{\mathcal{M}}\\ +\int_{0}^{1}\left\lVert m(\{\cdot\}\times T_{n})\right\rVert_{\mathcal{M}(\mathbb{R})}\,\mathrm{d}t.

(175)

Next, the combination of (171), (174), and (175) yields that

\left\lVert\nabla\{f\}\right\rVert_{\mathcal{M}^{2}}\leq\underset{n\to\infty}{\mathrm{limsup}}\left\lVert\nabla\{f_{n}\}\right\rVert_{\mathcal{M}^{2}}\leq\left\lVert\nabla\{f\}\right\rVert_{\mathcal{M}^{2}}+C

(176)

with

C=\underset{n\to\infty}{\text{limsup }}\int_{0}^{1}\Big(\left\lVert m(T_{n}\times\{\cdot\})\right\rVert_{\mathcal{M}(\mathbb{R})}\\ +\left\lVert m(\{\cdot\}\times T_{n})\right\rVert_{\mathcal{M}(\mathbb{R})}\Big)\,\mathrm{d}t.

(177)

Finally, we observe that

\Big(\left\lVert m(T_{n}\times\{\cdot\})\right\rVert_{\mathcal{M}(\mathbb{R})}+\left\lVert m(\{\cdot\}\times T_{n})\right\rVert_{\mathcal{M}(\mathbb{R})}\Big)\leq 2\left\lVert m\right\rVert_{\mathcal{M}},

(178)

and that

\underset{n\to\infty}{\text{lim}}\Big(\left\lVert m(T_{n}\times\{\cdot\})\right\rVert_{\mathcal{M}(\mathbb{R})}+\left\lVert m(\{\cdot\}\times T_{n})\right\rVert_{\mathcal{M}(\mathbb{R})}\Big)=0.

(179)

because

\bigcap_{n}T_{n}=\emptyset.

(180)

Equation (179) follows from the continuity from above of the total variation measure. It follows from (178) and (179) that one can apply the LDC to find that $C=0$ . The combination of Equations (171) and (176) imply the desired equality. ∎

$\displaystyle\|\bm{h}_{1,1}\ast\mathbf{b}\|[n_{1},n_{2}]\leq$	$\displaystyle\frac{1}{4}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}-1,2n_{2}-1]\|$
$\displaystyle+$	$\displaystyle\frac{1}{2}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}-1,2n_{2}]\|$
$\displaystyle+$	$\displaystyle\frac{1}{4}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}-1,2n_{2}+1]\|$
$\displaystyle+$	$\displaystyle\frac{1}{2}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1},2n_{2}-1]\|$
$\displaystyle+$	$\displaystyle\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1},2n_{2}]\|$
$\displaystyle+$	$\displaystyle\frac{1}{2}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1},2n_{2}+1]\|$
$\displaystyle+$	$\displaystyle\frac{1}{4}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}+1,2n_{2}-1]\|$
$\displaystyle+$	$\displaystyle\frac{1}{2}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}+1,2n_{2}]\|$
$\displaystyle+$	$\displaystyle\frac{1}{4}\|(\bm{h}_{1,1}\ast\mathbf{a})[2n_{1}+1,2n_{2}+1]\|.$	(149)