Mixed-Derivative Total Variation
Abstract
The formulation of norms on continuous-domain Banach spaces with exact pixel-based discretization is advantageous for solving inverse problems (IPs). In this paper, we investigate a new regularization that is a convex combination of a TV term and the norm of mixed derivatives. We show that the extreme points of the corresponding unit ball are indicator functions of polygons whose edges are aligned with either the - or -axis. We then apply this result to construct a new regularization for IPs, which can be discretized exactly by tensor products of first-order B-splines, or equivalently, pixels. Furthermore, we exactly discretize the loss of the denoising problem on its canonical pixel basis and prove that it admits a unique solution, which is also a solution to the underlying continuous-domain IP.
Index Terms:
Geometric measure theory, continuous-domain inverse problems, tensor product, representer theorem, denoising.I Introduction
The study of continuous-domain formulations of inverse problems (IP) is crucial in computational imaging [1, 2]. In this setting, the objective is to recover an unknown image from noisy measurements. Due to the pixel-based geometry of images, such problems are commonly addressed by seeking an approximate solution in
(1) |
where the sets form a pixel-based decomposition of . In several acquisition modalities, the forward measurement process is not naturally associated with a canonical pixel basis, in which case must be designed by the practitioner. In this context, it is advantageous to identify a continuous-domain Banach space that contains all pixel-type basis functions and within which can be refined. The search space is typically selected jointly with the regularization functional [3, 4], which encodes prior knowledge about the geometry of when the IP is ill-posed.
I-A TV Regularization
A standard regularization for the pixel basis is the total variation (TV) [5], which, for [6, 7] and in its anisotropic form, is given by
(2) |
where denotes the norm on the space of bounded measures [8]. This norm has the advantage of being exactly discretizable on the pixel basis, which makes it particularly suitable for image processing. Specifically, when is a linear combination of square pixels with weights , (2) can be discretized as
(3) |
where is determined by the pixel size, and
(4) |
The search for the optimal thus reduces to a compressed sensing problem [9, 10, 11], which can be solved using classical algorithms such as FISTA [12] or primal-dual methods [13, 14, 15, 16]. One reason for the popularity of TV regularization is the strong theoretical foundation on which it is built.
The norm is not only well defined for pixels, but also for indicator functions of Caccioppoli sets [17, 18], i.e., sets that can be approximated in measure by polygons. Within the framework of representer theorems (RTs) for Banach spaces [19, 20, 21, 22], Bredies and Carioni [23] built upon the results of [24, 25, 26] to show that TV regularization promotes solutions to inverse problems that are finite linear combinations of simple sets, Caccioppoli sets whose boundaries are represented by a unique Jordan curve [26].
I-B Mixed-Derivative Regularization
One approach to promoting pixel-type solutions with a spline structure is to impose additional regularity. In [32], the authors considered the search space , consisting of functions whose generalized gradient and generalized mixed derivative are bounded measures supported in . They showed that the extreme points of the unit ball associated with the norm are tensor products of -splines. However, this formulation is not entirely satisfactory because the regularization does not control the perimeter of sets, but only their number of corners. As a result, it may promote solutions containing components that are undesirable, such as indicators of unbounded sets.
I-C Contributions
In this paper, for , we regularize with the new norm
(5) |
which generalizes the anisotropic TV. Our contributions are fourfold.
Functional Framework: We define the search space as the set of functions supported in . We prove that this space is a dual Banach space for the norm , and we establish that all -norms are equivalent.
Extreme Point Characterization: We show that the extreme points of the unit ball of the convex cone of positive functions are indicator functions of polygons whose edges are parallel to either the - or -axis. We then apply this result to the regularization of abstract IPs with finite data.
Resolution of IPs: We demonstrate that tensor-product B-splines of order 1 exactly discretize a broad class of IP loss functionals, and that can be discretized exactly as
(6) |
where . We further prove that these exact discretizations converge, in an appropriate sense, to the continuous-domain IP as the pixel size tends to zero.
Application to Denoising: We prove that the canonical pixel-based discretization of the denoising loss admits a unique solution, which also solves the corresponding continuous-domain problem. Finally, we numerically assess the efficiency of the proposed method.
I-D Related Works
Higher-Order Generalization of TV. Bredies et al. introduced the generalized total variation (GTV) [33, 34, 23], which incorporates all derivatives up to a specified order . In particular, the second-order TV () is defined on the space of functions with bounded Hessian [35] and, for a smooth function , takes the form
(7) |
where denotes the Frobenius norm. Lefkimiatis et al. extended this notion by introducing the Hessian TV (HTV), in which the Frobenius norm is replaced by a Schatten- norm [36, 37, 38]. Regularization with HTV has been employed for the learning of continuous piecewise-linear (CPWL) mappings [39, 40, 41]. Ambrosio et al. showed that the closure, in the unit ball of the space of functions with finite HTV, of the set of CPWL extreme points contains all extremal points [42, 43].
Our regularization lies, in terms of regularity constraints, between TV and HTV. Specifically, it enforces more regularity than TV due to the presence of the term, but less than HTV because it omits the terms and . In particular, HTV is not well defined on pixel basis.
Extreme Points of Mixed Norms. We investigate the extreme points of the unit ball associated with our regularization norm, which takes the form
where and are two norms whose extreme points are known. When the search space can be decomposed as a direct sum , with and , the extreme points of have been systematically characterized [44]. However, when such a direct-sum decomposition is not possible, as is the case in this work, the description of the extreme points becomes significantly more challenging. In finite dimensions, they are located at the intersections of the hyperplanes that define the boundaries of the unit balls [45, 46, 47]. To the best of our knowledge, no analogous characterization is currently available in infinite-dimensional settings.
Continuous Data. TV was originally introduced as a regularization for the denoising loss , where is a domain and denotes the continuous data [5]. Subsequent works have deepened the theoretical understanding of this approach. The existence and uniqueness of solutions for this problem, as well as for the TV flow problem, were established in [48, 49, 50]. The regularity of solutions was analysed in [51], while discretization and convergence properties, including convergence rates, were investigated in [52, 53, 54]. Chambolle and Pock provide a comprehensive overview of TV discretization and convergence techniques in [55]. In the present work, we focus on the study of IPs with finite data.
II Theoretical Formulation
II-A Mixed-Derivative Native Space
Let denote the unit square. We define the native space as
(8) |
where denotes the space of bounded measures supported in , and . The second condition in (8) enforces that the function vanishes outside . In Proposition 1, we provide a more practical, integral-based characterization of the functions belonging to . To this end, we introduce the integral transform , defined for all by
(9) |
where is the heaviside function.
Proposition 1.
A function belongs to if and only if it belongs to
(10) |
Proof of Proposition 1.
Clearly, if satisfies (10), then . To prove the converse, assume and define . We claim that
(11) |
which proves the proposition. Observe that may differ from only by an element of the null space of in . Hence, one can write
(12) |
for some . Note that
(13) |
which directly implies that and proves the claim. ∎
Let , we equip with the mixed norm defined as
(14) |
with
(15) |
In the sequel, we write instead of and instead of . The key properties of are summarised in Theorem 1.
Theorem 1.
Let . Then, the vector space satisfies the following properties:
-
1.
All norms are well-defined and equivalent; that is, for all , there exists a constant such that
(16) -
2.
It is a Banach space.
-
3.
Equipped with the norm , it is the dual space of , where
(17) and is equipped with the norm
(18)
Proof of Theorem 1.
Item 1. Let and . Then,
(19) |
where Consequently, (16) holds with
(20) | ||||
(21) |
Item 2. We know from [32] that is a Banach space for the norm . Since is a closed subspace of , it is also Banach space for . By the norm equivalence, it follows that it is a Banach space for .
Item 3. Here, all quotient spaces are equipped with their quotient norms. We know from [32] that equipped with is the dual space of , with
(22) |
Since is a Banach subspace of , the predual of is given by the (pre-)annihilator
(23) |
∎
The key takeaway of Theorem 1 is that all -norms, for , are topologically equivalent on . We have shown that also admits a topology, such that is -convergent to in if and only if
(24) |
Observe that, even though the quotient seems to complicate the predual structure, one still has that is -convergent to if and only if
(25) |
because changing the representative in the quotient class, by definition, does not change the value of the brackets.
Next, we discuss the relation of with the space of functions of bounded variation, which is naturally equipped with the norm . We recall that, for a domain ,
(26) |
Proposition 2.
For a domain such that , the following holds:
-
1.
the functional is a norm on ;
-
2.
the space is a subspace of , i.e.
(27) -
3.
the vector space is not closed.
Proof of Proposition 2.
Item 1. Absolute homogeneity, the triangle inequality, and non-negativity are straightforward. In addition,
(28) |
for some . Since is compactly supported, one has that .
Item 2.
It follows from Item 1 of Theorem 1 that . Because , we find that . The same argument applies to conclude that .
Item 3.
Let and . We define
(29) |
with . Then, a straightforward calculation shows that the sequence with converges in to , which is a staircase with an accumulation of steps towards the axis. In addition, we calculate that
(30) |
which is an unbounded measure with . Consequently, is not closed. ∎
II-B Sets, Indecomposable Sets and Approximations
We study the geometrical properties of sets that satisfy . As a first step, we show that an indicator function , as an element of , forms an equivalence class.
Proposition 3.
Let be a measurable set. If a measurable set differs from by a set of measure zero, then
(31) |
In particular, if , then .
Proof of Proposition 3.
We calculate that
(32) | ||||
(33) | ||||
(34) | ||||
(35) |
In (32), we used the assumption that differs from by a set of measure zero. In (33) we used the fact that is dense in , and that the linear and continuous functional admits a unique (linear and continuous) extension to . In (34) we used Item 3 of Theorem 1. In (35) we used Item 1 of Theorem 1. Finally,
(36) |
∎
The fact that forms an equivalence class is often forgotten in the sequel as it is (often) clear from the context which representative to take. Proposition 4 shows that such indicators must have the form of a polygon with finitely many corners and edges parallel to the or the axis.
Proposition 4.
If , then is a discrete measure with finitely many atoms. In addition, one has the representation
(37) |
and
Proof of Proposition 4.
Step 1. We define
(38) |
Observe that in (38) the value of is set by measuring the closed square. Other choices, such as , would yield functions that fall in the same equivalence class as
Step 2.
We assume by contradiction that there exists a set of the form
(39) |
such that Observe that, for and , one has that
(40) |
Consequently, we either have that
-
•
in which case , which is a contradiction;
-
•
in which case
-
–
if , then we find a contradiction as before;
-
–
if , then which is a contradiction;
-
–
-
•
in which case we reach a contradiction as in the case.
Further, one can prove that the measure
-
1.
is integer valued on intersections of sets of the form (39);
-
2.
is integer valued on unions of sets of the form (39);
-
3.
is integer valued on the relative complement of a set of the form (39).
It follows that is integer-valued on any Borel set [56, Proposition 1.1.5]. This implies, with the boundedness of , that [57, Chapters 1-2]
(41) |
The fact that is -valued finally implies that . ∎
One important class is formed by indecomposable sets [23, 26], which we relabel as -indecomposable in Definition 1. These sets are central in our description of the unit ball of , as the indecomposability property translates into an extreme point structure. We extend this class to -indecomposable sets in Definition 2.
Definition 1.
Let be such that . The set is -decomposable if there exists a partition of such that, , ,
(42) |
and If the set is not -decomposable, it is -indecomposable. The set of all -indecomposable sets is denoted by
(43) |
Definition 2.
Let be such that . The set is -decomposable if there exists a partition of such that, , ,
(44) |
and If the set is not -decomposable, it is -indecomposable. The set of all -indecomposable sets is denoted by
(45) |
The relation between -indecomposability and -indecomposability is revealed in Proposition 5.
Proposition 5.
The strict inclusion holds true
(46) |
Proof of Proposition 5.
Item 1. We provide an example of a -indecomposable set that is not -indecomposable. To do so, we define
(47) | |||
(48) |
as well as . On the one hand, is -decomposable as, ,
(49) |
On the other hand, we know from [26, Proposition 2] that is -indecomposable.
Item 2. We show that, if is a -decomposition of , then it is also a -decomposition. We define , , and . It follows from Proposition 4 that , , and are discrete measures with finitely many atoms, such that
(50) |
Assume, by contradiction, that fails to be a -decomposition. Then, there exists such that
(51) |
This means that (respectively ) is a corner of (respectively ). Observe that, for to be a -decomposition of , their edges may overlap only on corners, and not on segments. This implies that both corners are facing each other, in which case and must have the same value. This is a contradiction.
∎
Indicator functions are not only useful for the indecomposable structure, but also for their approximation properties. We conclude this section with Lemma 1, which states that any function can be approximated, in an appropriate sense, by linear combinations of indicator functions. The proof is given in Appendix -A.
Lemma 1.
For , there exists a sequence of functions , such that is a discrete measure supported on and such that
-
1.
it is -convergent to in ;
-
2.
over the mixed derivative,
and
-
3.
it is convergent to in ;
-
4.
over the gradient, .
II-C Coarea and Cocorner Formulas
In the sequel, we will use the (convex) cone of positive functions, defined as
(52) |
which is closed in both the strong and the topologies.
We develop tools to split a function into a sum of indicator functions on indecomposable sets. To this end, we first recall the coarea formula. For and , we define the functions
(53) |
and
(54) |
Theorem 2.
The functions and are continuous, respectively increasing and decreasing, and satisfy the equalities
(55) |
(56) |
In addition, they satisfy the -coarea formula
(57) |
The coarea formula, for the norm on the gradient, was proved for Lipschitz functions in [58], for BV functions in [59] and was later extended for general norms (for example, see [60]). Equations (55) and (56) follow from the application of the coarea formula on and .
Next, we provide a new formula inspired by the coarea one. We define the functions and as
(58) |
(59) |
In Lemma 2, we adapt Theorem 2 for the operator and for piecewise constant functions.
Lemma 2.
Let be such that is a discrete measure with finitely many atoms. Then, the functions and are continuous, respectively increasing and decreasing, and satisfy the equalities
(60) |
(61) |
In addition, they verify the cocorner formula
(62) |
Proof of Lemma 2.
We set and define
(63) |
Observe that takes finitely many values with , and . Then, for one has that
(64) |
For with , we calculate that
(65) |
It follows directly from the representation (II-C) that is increasing and continuous. Next, we observe that may only be non-zero on . In addition, for , is in the support of both and if and only if it is a corner of and , in which case the Dirac mass centred at must have an amplitude of the same sign because . It follows from this remark and (II-C) that
(66) |
and, consequently, that A similar argumentation would show that is continuous, decreasing, and that
(67) |
Finally,
(68) |
∎
Lemma 3.
If , then the functions and are continuous and respectively increasing and decreasing. In addition, they verify the equality
(69) |
Proof of Lemma 3.
Step 1. Consider a sequence of functions converging to as in Lemma 1. Without loss of generality, we assume that . Our Claim 1 is that, up to a subsequence,
(70) |
and
(71) |
To prove this, we first recall that
(72) |
Item 2 of Lemma 1 implies that and converge in to and , which, together with the triangle inequality, yields the convergence of to . Therefore, converges in to . In particular,
(73) |
Remark that
(74) |
and that is dense in . Furthermore, it follows from Item 2 of Lemma 1 that
(75) |
Consequently, the sequence is uniformly bounded and, from (74), -convergent in to . A similar argument shows that the sequence is uniformly bounded and -convergent in to . It follows that
(76) |
and that
(77) |
In addition,
(78) | ||||
(79) |
Finally, if either (76) or (77) has a strict inequality, we conclude that
(80) | ||||
(81) | ||||
(82) |
which is a contradiction. In (80) we used the fact that
(83) |
and the triangular inequality. In (82) we used (79). Consequently, (76) and (77) hold with an equality and, up to a subsequence, (70) and (71) hold.
Step 2. The function is increasing. Indeed, Lemma 2 yields that, and ,
(84) |
where we used Claim 1 to pass to the limit. The claim that is decreasing and (69) are proved likewise.
Step 3. We now show that and are continuous. Let be a sequence convergent to . An argument similar to the one in Step 1 yields that (in parallel with (76) and (77))
(85) |
from which it follows that
(86) | ||||
(87) | ||||
(88) |
Equation (87) follows from (69), and the inequality in (86) cannot be strict. It follows that
(89) |
∎
II-D Extreme Point Characterization
We are now in a position to state our main result, Theorem 3.
Theorem 3.
The extreme points of the unit ball in , equipped with , are exactly of the form with and,
-
•
if then ;
-
•
if then
Proof of Theorem 3.
Step 1. We show that an extreme point must be of the form . Let be an extreme point and consider the functions
(90) |
(91) |
We know that and are continuous and respectively increasing and decreasing. In addition, they satisfy the equality
(92) |
By continuity, there exists such that , with the decomposition
(93) |
We find that
(94) |
and that
(95) |
Since is an extreme point, we must have that
(96) |
If , then . If , then These two equations imply that has to be of the form with Finally, if we assume by contradiction that is not -indecomposable. Then, it is not -indecomposable and there exist two sets , such that ,
(97) |
and
(98) |
We set , and find that
(99) |
which is in contradiction with the assumption that is an extreme point. The case is similar and omitted.
Step 2. We show that is an extreme point. Assume by contradiction that it is not. Then, there exists and two functions in the unit ball such that
(100) |
Let be a measurable set. We define
(101) |
We must have
(102) |
otherwise we would reach the contradiction
(103) |
which is impossible.
If , then (102) implies that In particular, must vanish in the interior . It follows from the Constancy Theorem [61], [23, Lemma 4.6] that is constant on . Finally, since with and , it follows that and must vanish on and satisfy . This contradicts the assumption that .
If , then and must be subsets of the finite set of corners of , denoted by . We use with subscript to denote the set of corners of a set . The functions and must be piecewise constant on and vanish outside. Assume, by contradiction, that and are not constant on . Then,
(104) |
and, by construction, . We claim that there exists a , wlog , such that
(105) |
Assume by contradiction that there exists no such . Then, there exists a corner and two indices , wlog , such that
(106) |
This implies that and, therefore, that one must have for the associated Dirac mass to cancel each other. We apply the same argumentation to find an index k, wlog , such that
(107) |
In particular, and share a corner , which therefore must be cancelled. This happens if and only if , which is a contradiction. Thus, , and . Finally , which is a contradiction with the initial assumption.
∎
In Corollary 1, we leverage the characterisation of extreme points from Theorem 3 to analyse the solution set of a data-fidelity term regularised by
Corollary 1.
Let the loss functional be defined as
(108) |
with
-
•
is a fixed-dimensional vector representing the available measurements;
-
•
the data-fidelity functional is proper, coercive, continuous, and strictly convex in its second argument;
-
•
the measurement operator is linear and continuous.
Then, the solution set
(109) |
is nonempty, convex, and -compact. Moreover, its extreme points are of the form
(110) |
where if , and if .
Proof of Corollary 1.
Define and consider a sequence of solutions such that
(111) |
which exists because is coercive and proper. It follows that is bounded in and, by norm equivalence, bounded in . According to the Banach–Alaoglu theorem, up to a subsequence, is -convergent to some limit . It follows from the -lower semicontinuity of that
(112) |
Consequently, is nonempty, and the characterisation of its extreme points follows from [22, Corollary 3.8] and Theorem 3. ∎
The norm penalises the number of corners of the sets , whereas penalises their perimeter. Each has a distinct regularisation effect on the solutions of the optimisation problem. Informally, the former promotes solutions with simpler sets , while the latter favours solutions with smaller sets . This effect is proportional to , which, in practice, must be tuned.
We conclude this section with Proposition 6, which identifies an important class of -continuous measurement operators .
Proposition 6.
If is such that , then the operator
(113) |
is -continuous.
Proof of Proposition 6.
III Resolution of the Optimization Problem
The resolution of the problem in (109) is not directly feasible, as the underlying search space is infinite-dimensional. Moreover, the fact that the edges and corners of in (110) may lie anywhere in makes their estimation challenging.
III-A Pixel-Based Exact Discretization
We discretize the search space into a finite-dimensional cone, of functions of the form (110), with corners that lie on the dyadic grid
(116) |
with knots, and associated pixels
(117) |
The discretized search space is
(118) |
and the associated discretized optimization problem is as in
(119) |
Our objective is now to reformulate (119) as an optimisation problem over , where . We note that any function can be uniquely expressed as
(120) |
which corresponds to the tensor product of B-splines of order 1. Furthermore, we introduce the synthesis operator
(121) |
and its adjoint, the analysis operator
(122) |
The operators and are implicitly dependent on the grid . In the sequel, we shall occasionally regard as a sequence indexed over , in which case its entries will be denoted by and taken to be equal to outside . We consider the new optimisation problem
(123) |
where
(124) |
with
(125) |
A key feature of this formulation, as shown in Proposition 7, is that the problem in (123) is the exact discretisation of the problem in (119), i.e., it incurs no discretisation error.
Proposition 7.
III-B Convergence
The solution obtained by solving the discretized optimisation problem (119) is only an approximation of some . Therefore, for this exact grid-based discretisation to be reliable and practical, one must be confident that, in some sense,
(134) |
This convergence is established in Theorem 4.
Theorem 4.
For any sequence of discretized solutions with and for any solution , the following convergences hold:
-
1.
over the loss functional, ;
-
2.
over the simulated measurements, .
Proof of Theorem 4.
Item 1. We fix and use Lemma 1 to find a sequence with that is -convergent to and satisfies . Then, the -convergence implies that
(135) |
It follows from (135) that
(136) |
Item 2. The convergence in Item 1 implies that the sequence is bounded. In turn, this boundedness, combined with the assumptions on , implies that the sequence is bounded in , and therefore in . By the Banach-Alaoglu theorem, we extract a -convergent subsequence and find that
(137) |
for some . If one assumes by contradiction that the sequence does not converge to , then a classical argument [30, Theorem 3], which leverages the strict convexity of , leads to a contradiction. ∎
IV Application to Denoising
IV-A General Description
Our goal is to denoise an image , which is assumed to be square and composed of pixels
(138) |
at resolution . The continuous-domain optimization problem is formulated as
(139) |
where , and ,
(140) |
IV-B Resolution on Finite Grid
We argue that, to solve the continuous-domain problem in (139), it is sufficient to find the solution . As a first step, we show in Theorem 5 that the sets of solutions are embedded one into another.
Theorem 5.
For all , the inclusions hold true
(143) |
Proof of Theorem 5.
We define the downsampling operator
(144) |
with equal to
(145) |
Step 1. We claim that, with ,
(146) |
A straightforward calculation yields that is equal to and, consequently, that
(147) |
Next, we get from (127) that
(148) |
The injection of (145) in (148) and the use of several triangle inequalities (9 of them) yield that
(149) |
Finally, the sum of (149) over yields that
(150) |
and a similar calculation would show that
(151) |
The claim (146) follows from the combination of (147), (150), and (151).
Step 2.
Let and . One can iterate Step 1 to find that
(152) |
where the LHS inequality follows from the inclusion and the RHS inequality follows from (146). In addition, it follows from the inclusion
(153) |
that , which, combined with (152), yields that . This proves the inclusion Finally, if is a sequence of solutions with , then for some ,
(154) |
One can iterate (146) to find that, ,
(155) |
The combination of Equation (155) and (154) yields the inclusion ∎
Theorem 5 reveals that a solution of the continuous-domain problem can be found by solving the optimization problem on . The optimization over has a unique solution.
Proposition 8.
There is a unique solution whose knots lie on the grid , i.e.,
(156) |
IV-C Numerical Experiments
We fix the data fidelity as the squared difference and study the optimization problem in
(158) |
where is defined in (142) and is the image to denoise. We discretize this optimization problem, as in (141), on the canonical pixel basis and denote by its unique solution (Theorem 5 and Proposition 8). Then, we consider the synthesis formulation (123) of the discretized optimization problem in
(159) |
whose regularization term has been reparametrized for convenience, with
(160) |
The convolution in (159) is applied element wise on the rows of Finally, since there is no closed form for the proximal operator of , we solve the dual problem. The latter is optimised with the FISTA algorithm [12]. In practice, our method extends without any issues to images that are not square. The denoising efficiency of our method, labelled MTV, is assessed on the BSD68 test set alongside two other benchmark methods. The results are summarised in Table I.
TV | 36.41 | 29.91 | 27.48 | |
MTV | 36.83 | 30.20 | 27.72 | |
CRR-NN [63] | 36.96 | 30.55 | 28.11 |
To investigate the choice of the hyper-parameter , we denote by the value of that maximises the PSNR for a specific image and noise level , and provide in Table II statistics on the distribution of .
Expectation | 0.37 | 0.33 | 0.33 | 0.04 |
Standard deviation | 0.07 | 0.08 | 0.12 | 0.04 |
We observe from the expectations in Table II that our method does not reduce to the TV one (). The optimal convex combination of -based and -based regularizations is non-trivial and appears necessary for efficient denoising of natural images. Furthermore, the column shows that the choice of the optimal is stable with respect to , which, in turn, implies that the optimal depends mostly on the geometry of the image rather than on the noise level.
We observe that images in BSD68 for which MTV significantly outperforms TV, by more than dB, typically feature corner-like structures, such as in Image 1.

To better illustrate this behaviour, we consider a image of New York City, free of rights, as shown in Figure 2. For the noise level , the TV and MTV methods yield PSNR values of 24.70 and 25.52, respectively. We observe that bands of constant values and corners are reconstructed better with MTV.

V Conclusion
In this paper, we have shown that the convex combination of the TV regularization with the tensor product term promotes piecewise constant functions with edges parallel to the or the axis, as solutions to a continuous-domain optimization problem. In the particular case of denoising, we revealed that the optimization problem discretized on the canonical, pixel-based grid has a unique solution, which is also a solution of the continuous-domain problem. Finally, we demonstrated that the discretized problem can be solved with the same algorithm as for TV, with an additional filter, and that our new regularization significantly improves the reconstruction quality when the image has the appropriate underlying geometry.
Acknowledgment
Vincent Guillemet was supported by the Swiss National Science Foundation (SNSF) under Grant 200020_219356.
References
- [1] M. Bertero, P. Boccacci, and C. De Mol, Introduction to Inverse Problems in Imaging. CRC press, 2021.
- [2] M. T. McCann, M. Unser et al., “Biomedical image reconstruction: the foundations to deep neural networks,” Foundations and Trends® in Signal Processing, vol. 13, no. 3, pp. 283–359, 2019.
- [3] H. Gupta, J. Fageot, and M. Unser, “Continuous-domain solutions of linear inverse problems with Tikhonov versus generalized TV regularization,” IEEE Transactions on Signal Processing, vol. 66, no. 17, pp. 4670–4684, 2018.
- [4] M. Unser and J. Fageot, “Native Banach spaces for splines and variational inverse problems,” arXiv preprint arXiv:1904.10818, 2019.
- [5] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259–268, 1992.
- [6] L. Ambrosio, N. Fusco, and D. Pallara, Functions of Bounded Variation and Free Discontinuity Problems. Oxford University Press, 2000.
- [7] L. Evans, Measure Theory and Fine Properties of Functions. Routledge, 2018.
- [8] W. Rudin, Real and Complex Analysis, 3rd ed. New York: McGraw-Hill, 1987.
- [9] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
- [10] E. Candes and J. Romberg, “Sparsity and incoherence in compressive sampling,” Inverse Problems, vol. 23, no. 3, p. 969, 2007.
- [11] A. M. Bruckstein, D. L. Donoho, and M. Elad, “From sparse solutions of systems of equations to sparse modeling of signals and images,” SIAM Review, vol. 51, no. 1, pp. 34–81, 2009.
- [12] A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2419–2434, 2009.
- [13] M. Zhu and T. Chan, “An efficient primal-dual hybrid gradient algorithm for total variation image restoration,” Ucla Cam Report, vol. 34, no. 2, 2008.
- [14] E. Esser, X. Zhang, and T. F. Chan, “A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science,” SIAM Journal on Imaging Sciences, vol. 3, no. 4, pp. 1015–1046, 2010.
- [15] A. Chambolle and T. Pock, “A first-order primal-dual algorithm for convex problems with applications to imaging,” Journal of Mathematical Imaging and Vision, vol. 40, pp. 120–145, 2011.
- [16] L. Condat, “A primal–dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms,” Journal of Optimization Theory and Applications, vol. 158, no. 2, pp. 460–479, 2013.
- [17] R. Caccioppoli, “Misura e integrazione sugli insiemi dimensionalmente orientati,” Atti Accad. Naz. Lincei. Rend. Cl. Sci. Fis. Mat. Nat.(8), vol. 12, pp. 3–11, 1952.
- [18] E. De Giorgi, “Su una teoria generale della misura (r- 1)-dimensionale in uno spazio ad r dimensioni,” Annali di Matematica Pura ed Applicata, vol. 36, pp. 191–213, 1954.
- [19] A. Flinth and P. Weiss, “Exact solutions of infinite dimensional total-variation regularized problems,” Information and Inference: A Journal of the IMA, vol. 8, no. 3, pp. 407–443, 2019.
- [20] M. Unser, J. Fageot, and J. P. Ward, “Splines are universal solutions of linear inverse problems with generalized TV regularization,” SIAM Review, vol. 59, no. 4, pp. 769–793, 2017.
- [21] M. Unser, “A unifying representer theorem for inverse problems and machine learning,” Foundations of Computational Mathematics, vol. 21, no. 4, pp. 941–960, 2021.
- [22] C. Boyer, A. Chambolle, Y. D. Castro, V. Duval, F. De Gournay, and P. Weiss, “On representer theorems and convex regularization,” SIAM Journal on Optimization, vol. 29, no. 2, pp. 1260–1281, 2019.
- [23] K. Bredies and M. Carioni, “Sparsity of solutions for variational inverse problems with finite-dimensional data,” Calculus of Variations and Partial Differential Equations, vol. 59, no. 1, p. 14, 2020.
- [24] W. H. Fleming, “Functions with generalized gradient and generalized surfaces,” Annali di Matematica Pura ed Applicata, vol. 44, no. 1, pp. 93–103, 1957.
- [25] ——, “Functions whose partial derivatives are measures,” Illinois Journal of Mathematics, vol. 4, no. 3, pp. 452–478, 1960.
- [26] L. Ambrosio, V. Caselles, S. Masnou, and J.-M. Morel, “Connected components of sets of finite perimeter and applications to image processing,” Journal of the European Mathematical Society, vol. 3, no. 1, pp. 39–92, 2001.
- [27] M. Unser, “Splines: A perfect fit for signal and image processing,” IEEE Signal Processing Magazine, vol. 16, no. 6, pp. 22–38, 1999.
- [28] T. Debarre, J. Fageot, H. Gupta, and M. Unser, “B-spline-based exact discretization of continuous-domain inverse problems with generalized TV regularization,” IEEE Transactions on Information Theory, vol. 65, no. 7, pp. 4457–4470, 2019.
- [29] T. Debarre, Q. Denoyelle, and J. Fageot, “On the uniqueness of solutions for the basis pursuit in the continuum,” Inverse Problems, vol. 38, no. 12, p. 125005, 2022.
- [30] V. Guillemet, J. Fageot, and M. Unser, “Convergence analysis of the discretization of continuous-domain inverse problems,” Inverse Problems, vol. 41, no. 4, p. 045008, 2025.
- [31] J. Fageot, “Variational seasonal-trend decomposition with sparse continuous-domain regularization,” arXiv preprint arXiv:2505.10486, 2025.
- [32] V. Guillemet and M. Unser, “Variational tensor product splines,” En Preparation, 2025.
- [33] K. Bredies, K. Kunisch, and T. Pock, “Total generalized variation,” SIAM Journal on Imaging Sciences, vol. 3, no. 3, pp. 492–526, 2010.
- [34] K. Bredies and M. Holler, “Regularization of linear inverse problems with total generalized variation,” Journal of Inverse and Ill-Posed Problems, vol. 22, no. 6, pp. 871–913, 2014.
- [35] F. Demengel, “Fonctions à hessien borné,” in Annales de l’institut Fourier, vol. 34, no. 2, 1984, pp. 155–190.
- [36] S. Lefkimmiatis, A. Bourquard, and M. Unser, “Hessian-based norm regularization for image restoration with biomedical applications,” IEEE Transactions on Image Processing, vol. 21, no. 3, pp. 983–995, 2011.
- [37] S. Lefkimmiatis, J. P. Ward, and M. Unser, “Hessian Schatten-norm regularization for linear inverse problems,” IEEE Transactions on Image Processing, vol. 22, no. 5, pp. 1873–1888, 2013.
- [38] S. Aziznejad, J. Campos, and M. Unser, “Measuring complexity of learning schemes using Hessian-Schatten total variation,” SIAM Journal on Mathematics of Data Science, vol. 5, no. 2, pp. 422–445, 2023.
- [39] J. Campos, S. Aziznejad, and M. Unser, “Learning of continuous and piecewise-linear functions with Hessian total-variation regularization,” IEEE Open Journal of Signal Processing, vol. 3, pp. 36–48, 2022.
- [40] M. Pourya, A. Goujon, and M. Unser, “Delaunay-triangulation-based learning with Hessian total-variation regularization,” IEEE Open Journal of Signal Processing, vol. 4, pp. 167–178, 2023.
- [41] M. Pourya, A. Boquet-Pujadas, and M. Unser, “A box-spline framework for inverse problems with continuous-domain sparsity constraints,” IEEE Transactions on Computational Imaging, 2024.
- [42] L. Ambrosio, C. Brena, and S. Conti, “Functions with bounded Hessian–Schatten variation: Density, variational, and extremality properties,” Archive for Rational Mechanics and Analysis, vol. 247, no. 6, p. 111, 2023.
- [43] L. Ambrosio, S. Aziznejad, C. Brena, and M. Unser, “Linear inverse problems with Hessian–Schatten total variation,” Calculus of Variations and Partial Differential Equations, vol. 63, no. 1, p. 9, 2024.
- [44] M. Unser and S. Aziznejad, “Convex optimization in sums of Banach spaces,” Applied and Computational Harmonic Analysis, vol. 56, pp. 1–25, 2022.
- [45] L. E. Dubins, “On extreme points of convex sets,” Journal of Mathematical Analysis and Applications, vol. 5, no. 2, pp. 237–244, 1962.
- [46] R. Parhi and M. Unser, “The sparsity of cycle spinning for wavelet-based solutions of linear inverse problems,” IEEE Signal Processing Letters, vol. 30, pp. 568–572, 2023.
- [47] M. Unser and S. Ducotterd, “Universal architectures for the learning of polyhedral norms and convex regularizers,” arXiv preprint arXiv:2503.19190, 2025.
- [48] G. Bellettini, V. Caselles, and M. Novaga, “The total variation flow in rn,” Journal of Differential Equations, vol. 184, no. 2, pp. 475–525, 2002.
- [49] F. Andreu, C. Ballester, V. Caselles, and J. M. Mazón, “Minimizing total variation flow,” 2001.
- [50] F. Andreu, V. Caselles, J. I. Díaz, and J. M. Mazón, “Some qualitative properties for the total variation flow,” Journal of Functional Analysis, vol. 188, no. 2, pp. 516–547, 2002.
- [51] V. Caselles, A. Chambolle, and M. Novaga, “Regularity for solutions of the total variation denoising problem,” Revista Matemática Iberoamericana, vol. 27, no. 1, pp. 233–252, 2011.
- [52] M.-J. Lai, B. Lucier, and J. Wang, “The convergence of a central-difference discretization of Rudin-Osher-Fatemi model for image denoising,” in Scale Space and Variational Methods in Computer Vision: Second International Conference, SSVM 2009, Voss, Norway, June 1-5, 2009. Proceedings 2. Springer, 2009, pp. 514–526.
- [53] S. Bartels, “Total variation minimization with finite elements: convergence and iterative solution,” SIAM Journal on Numerical Analysis, vol. 50, no. 3, pp. 1162–1180, 2012.
- [54] J. Wang and B. J. Lucier, “Error bounds for finite-difference methods for Rudin–Osher–Fatemi image smoothing,” SIAM Journal on Numerical Analysis, vol. 49, no. 2, pp. 845–868, 2011.
- [55] A. Chambolle and T. Pock, “Approximating the total variation with finite differences or finite elements,” in Handbook of Numerical Analysis. Elsevier, 2021, vol. 22, pp. 383–417.
- [56] D. L. Cohn, Measure Theory. Springer, 2013, vol. 2.
- [57] O. Kallenberg et al., Random Measures, Theory and Applications. Springer, 2017, vol. 1.
- [58] H. Federer, “Curvature measures,” Transactions of the American Mathematical Society, vol. 93, no. 3, pp. 418–491, 1959.
- [59] W. H. Fleming and R. Rishel, “An integral formula for total gradient variation,” Archiv der Mathematik, vol. 11, pp. 218–222, 1960.
- [60] L. Rotem, “The anisotropic total variation and surface area measures,” in Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2020-2022. Springer, 2023, pp. 297–312.
- [61] G. Dolzmann and S. Müller, “Microstructures with finite surface energy: the two-well problem,” Archive for Rational Mechanics and Analysis, vol. 132, pp. 101–141, 1995.
- [62] S. Bonettini, S. Rebegoldi, and V. Ruggiero, “Inertial variable metric techniques for the inexact forward–backward algorithm,” SIAM Journal on Scientific Computing, vol. 40, no. 5, pp. A3180–A3210, 2018.
- [63] A. Goujon, S. Neumayer, P. Bohra, S. Ducotterd, and M. Unser, “A neural-network-based convex regularizer for inverse problems,” IEEE Transactions on Computational Imaging, vol. 9, pp. 781–795, 2023.
-A Proofs
Proof of Lemma 1.
Construction. Consider the partition of with
(161) |
and the partition of with We define and
(162) |
We calculate that
(163) |
and observe that, by construction, is compactly supported in . Therefore, .
Item 1.
For , we calculate that
(164) |
because is uniformly continuous. This shows the -convergence.
Item 2. The observation that and the -convergence proved in Item 1 imply that
(165) |
Item 3. We calculate that
(166) |
It follows that the family is upper-bounded by . In addition, for , we calculate that
(167) |
In (-A), we used the observation that taking the measure on the closed square or the half-open square yields functions that belong to the same equivalence class. The convergence follows from the continuity from above of measures. Consequently, since the sequence is upper-bounded by an function and converges pointwise almost everywhere to , we conclude by the Lebesgue dominated convergence theorem (LDC) that it converges in to .
Item 4: We first recall that, for ,
(168) |
Since , one has from the theory of BV functions that is equal to
(169) |
For , we calculate that
(170) |
because, since is compactly supported in , we can use the same argument as in (164) with . It follows that, ,
(171) |
To continue, we observe that
(172) |
Equation (172) also holds with replaced by , and by . If then
(173) |
It follows that
(174) |
where returns the largest such that . We write and observe that a similar argumentation will show that
(175) |
Next, the combination of (171), (174), and (175) yields that
(176) |
with
(177) |
Finally, we observe that
(178) |
and that
(179) |
because
(180) |
Equation (179) follows from the continuity from above of the total variation measure. It follows from (178) and (179) that one can apply the LDC to find that . The combination of Equations (171) and (176) imply the desired equality. ∎