Ordinality applied theory and its various aspects.pdf
1. ℓ1-norm Methods for
Convex-Cardinality Problems
• problems involving cardinality
• the ℓ1-norm heuristic
• convex relaxation and convex envelope interpretations
• examples
• recent results
EE364b, Stanford University
2. ℓ1-norm heuristics for cardinality problems
• cardinality problems arise often, but are hard to solve exactly
• a simple heuristic, that relies on ℓ1-norm, seems to work well
• used for many years, in many fields
– sparse design
– LASSO, robust estimation in statistics
– support vector machine (SVM) in machine learning
– total variation reconstruction in signal processing, geophysics
– compressed sensing
• new theoretical results guarantee the method works, at least for a few
problems
EE364b, Stanford University 1
3. Cardinality
• the cardinality of x ∈ Rn
, denoted card(x), is the number of nonzero
components of x
• card is separable; for scalar x, card(x) =
0 x = 0
1 x 6= 0
• card is quasiconcave on Rn
+ (but not Rn
) since
card(x + y) ≥ min{card(x), card(y)}
holds for x, y 0
• but otherwise has no convexity properties
• arises in many problems
EE364b, Stanford University 2
4. General convex-cardinality problems
a convex-cardinality problem is one that would be convex, except for
appearance of card in objective or constraints
examples (with C, f convex):
• convex minimum cardinality problem:
minimize card(x)
subject to x ∈ C
• convex problem with cardinality constraint:
minimize f(x)
subject to x ∈ C, card(x) ≤ k
EE364b, Stanford University 3
5. Solving convex-cardinality problems
convex-cardinality problem with x ∈ Rn
• if we fix the sparsity pattern of x (i.e., which entries are zero/nonzero)
we get a convex problem
• by solving 2n
convex problems associated with all possible sparsity
patterns, we can solve convex-cardinality problem
(possibly practical for n ≤ 10; not practical for n 15 or so . . . )
• general convex-cardinality problem is (NP-) hard
• can solve globally by branch-and-bound
– can work for particular problem instances (with some luck)
– in worst case reduces to checking all (or many of) 2n
sparsity patterns
EE364b, Stanford University 4
6. Boolean LP as convex-cardinality problem
• Boolean LP:
minimize cT
x
subject to Ax b, xi ∈ {0, 1}
includes many famous (hard) problems, e.g., 3-SAT, traveling salesman
• can be expressed as
minimize cT
x
subject to Ax b, card(x) + card(1 − x) ≤ n
since card(x) + card(1 − x) ≤ n ⇐⇒ xi ∈ {0, 1}
• conclusion: general convex-cardinality problem is hard
EE364b, Stanford University 5
7. Sparse design
minimize card(x)
subject to x ∈ C
• find sparsest design vector x that satisfies a set of specifications
• zero values of x simplify design, or correspond to components that
aren’t even needed
• examples:
– FIR filter design (zero coefficients reduce required hardware)
– antenna array beamforming (zero coefficients correspond to unneeded
antenna elements)
– truss design (zero coefficients correspond to bars that are not needed)
– wire sizing (zero coefficients correspond to wires that are not needed)
EE364b, Stanford University 6
8. Sparse modeling / regressor selection
fit vector b ∈ Rm
as a linear combination of k regressors (chosen from n
possible regressors)
minimize kAx − bk2
subject to card(x) ≤ k
• gives k-term model
• chooses subset of k regressors that (together) best fit or explain b
• can solve (in principle) by trying all
n
k
choices
• variations:
– minimize card(x) subject to kAx − bk2 ≤ ǫ
– minimize kAx − bk2 + λ card(x)
EE364b, Stanford University 7
9. Sparse signal reconstruction
• estimate signal x, given
– noisy measurement y = Ax + v, v ∼ N(0, σ2
I) (A is known; v is not)
– prior information card(x) ≤ k
• maximum likelihood estimate x̂ml is solution of
minimize kAx − yk2
subject to card(x) ≤ k
EE364b, Stanford University 8
10. Estimation with outliers
• we have measurements yi = aT
i x + vi + wi, i = 1, . . . , m
• noises vi ∼ N(0, σ2
) are independent
• only assumption on w is sparsity: card(w) ≤ k
• B = {i | wi 6= 0} is set of bad measurements or outliers
• maximum likelihood estimate of x found by solving
minimize
P
i6∈B(yi − aT
i x)2
subject to |B| ≤ k
with variables x and B ⊆ {1, . . . , m}
• equivalent to
minimize ky − Ax − wk2
2
subject to card(w) ≤ k
EE364b, Stanford University 9
11. Minimum number of violations
• set of convex inequalities
f1(x) ≤ 0, . . . , fm(x) ≤ 0, x ∈ C
• choose x to minimize the number of violated inequalities:
minimize card(t)
subject to fi(x) ≤ ti, i = 1, . . . , m
x ∈ C, t ≥ 0
• determining whether zero inequalities can be violated is (easy) convex
feasibility problem
EE364b, Stanford University 10
12. Linear classifier with fewest errors
• given data (x1, y1), . . . , (xm, ym) ∈ Rn
× {−1, 1}
• we seek linear (affine) classifier y ≈ sign(wT
x + v)
• classification error corresponds to yi(wT
x + v) ≤ 0
• to find w, v that give fewest classification errors:
minimize card(t)
subject to yi(wT
xi + v) + ti ≥ 1, i = 1, . . . , m
with variables w, v, t (we use homogeneity in w, v here)
EE364b, Stanford University 11
13. Smallest set of mutually infeasible inequalities
• given a set of mutually infeasible convex inequalities
f1(x) ≤ 0, . . . , fm(x) ≤ 0
• find smallest (cardinality) subset of these that is infeasible
• certificate of infeasibility is g(λ) = infx(
Pm
i=1 λifi(x)) ≥ 1, λ 0
• to find smallest cardinality infeasible subset, we solve
minimize card(λ)
subject to g(λ) ≥ 1, λ 0
(assuming some constraint qualifications)
EE364b, Stanford University 12
14. Portfolio investment with linear and fixed costs
• we use budget B to purchase (dollar) amount xi ≥ 0 of stock i
• trading fee is fixed cost plus linear cost: β card(x) + αT
x
• budget constraint is 1T
x + β card(x) + αT
x ≤ B
• mean return on investment is µT
x; variance is xT
Σx
• minimize investment variance (risk) with mean return ≥ Rmin:
minimize xT
Σx
subject to µT
x ≥ Rmin, x 0
1T
x + β card(x) + αT
x ≤ B
EE364b, Stanford University 13
15. Piecewise constant fitting
• fit corrupted xcor by a piecewise constant signal x̂ with k or fewer jumps
• problem is convex once location (indices) of jumps are fixed
• x̂ is piecewise constant with ≤ k jumps ⇐⇒ card(Dx̂) ≤ k, where
D =
1 −1
1 −1
... ...
1 −1
∈ R(n−1)×n
• as convex-cardinality problem:
minimize kx̂ − xcork2
subject to card(Dx̂) ≤ k
EE364b, Stanford University 14
16. Piecewise linear fitting
• fit xcor by a piecewise linear signal x̂ with k or fewer kinks
• as convex-cardinality problem:
minimize kx̂ − xcork2
subject to card(∇2
x̂) ≤ k
where
∇2
=
−1 2 −1
−1 2 −1
... ... ...
−1 2 −1
EE364b, Stanford University 15
17. ℓ1-norm heuristic
• replace card(z) with γkzk1, or add regularization term γkzk1 to
objective
• γ 0 is parameter used to achieve desired sparsity
(when card appears in constraint, or as term in objective)
• more sophisticated versions use
P
i wi|zi| or
P
i wi(zi)+ +
P
i vi(zi)−,
where w, v are positive weights
EE364b, Stanford University 16
18. Example: Minimum cardinality problem
• start with (hard) minimum cardinality problem
minimize card(x)
subject to x ∈ C
(C convex)
• apply heuristic to get (easy) ℓ1-norm minimization problem
minimize kxk1
subject to x ∈ C
EE364b, Stanford University 17
19. Example: Cardinality constrained problem
• start with (hard) cardinality constrained problem (f, C convex)
minimize f(x)
subject to x ∈ C, card(x) ≤ k
• apply heuristic to get (easy) ℓ1-constrained problem
minimize f(x)
subject to x ∈ C, kxk1 ≤ β
or ℓ1-regularized problem
minimize f(x) + γkxk1
subject to x ∈ C
β, γ adjusted so that card(x) ≤ k
EE364b, Stanford University 18
20. Polishing
• use ℓ1 heuristic to find x̂ with required sparsity
• fix the sparsity pattern of x̂
• re-solve the (convex) optimization problem with this sparsity pattern to
obtain final (heuristic) solution
EE364b, Stanford University 19
21. Interpretation as convex relaxation
• start with
minimize card(x)
subject to x ∈ C, kxk∞ ≤ R
• equivalent to mixed Boolean convex problem
minimize 1T
z
subject to |xi| ≤ Rzi, i = 1, . . . , n
x ∈ C, zi ∈ {0, 1}, i = 1, . . . , n
with variables x, z
EE364b, Stanford University 20
22. • now relax zi ∈ {0, 1} to zi ∈ [0, 1] to obtain
minimize 1T
z
subject to |xi| ≤ Rzi, i = 1, . . . , n
x ∈ C
0 ≤ zi ≤ 1, i = 1, . . . , n
which is equivalent to
minimize (1/R)kxk1
subject to x ∈ C
kxk∞ ≤ R
the ℓ1 heuristic
• optimal value of this problem is lower bound on original problem
EE364b, Stanford University 21
23. Interpretation via convex envelope
• convex envelope fenv
of a function f on set C is the largest convex
function that is an underestimator of f on C
• epi(fenv
) = Co(epi(f))
• fenv
= (f∗
)∗
(with some technical conditions)
• for x scalar, |x| is the convex envelope of card(x) on [−1, 1]
• for x ∈ Rn
scalar, (1/R)kxk1 is convex envelope of card(x) on
{z | kzk∞ ≤ R}
EE364b, Stanford University 22
24. Weighted and asymmetric ℓ1 heuristics
• minimize card(x) over convex set C
• suppose we know lower and upper bounds on xi over C
x ∈ C =⇒ li ≤ xi ≤ ui
(best values for these can be found by solving 2n convex problems)
• if ui 0 or li 0, then card(xi) = 1 (i.e., xi 6= 0) for all x ∈ C
• assuming li 0, ui 0, convex relaxation and convex envelope
interpretations suggest using
n
X
i=1
(xi)+
ui
+
(xi)−
−li
as surrogate (and also lower bound) for card(x)
EE364b, Stanford University 23
25. Regressor selection
minimize kAx − bk2
subject to card(x) ≤ k
• heuristic:
– minimize kAx − bk2 + γkxk1
– find smallest value of γ that gives card(x) ≤ k
– fix associated sparsity pattern (i.e., subset of selected regressors) and
find x that minimizes kAx − bk2
EE364b, Stanford University 24
26. Example (6.4 in BV book)
• A ∈ R10×20
, x ∈ R20
, b ∈ R10
• dashed curve: exact optimal (via enumeration)
• solid curve: ℓ1 heuristic with polishing
0 1 2 3 4
0
2
4
6
8
10
kAx − bk2
card
(x)
EE364b, Stanford University 25
27. Sparse signal reconstruction
• convex-cardinality problem:
minimize kAx − yk2
subject to card(x) ≤ k
• ℓ1 heuristic:
minimize kAx − yk2
subject to kxk1 ≤ β
(called LASSO)
• another form: minimize kAx − yk2 + γkxk1
(called basis pursuit denoising)
EE364b, Stanford University 26
28. Example
• signal x ∈ Rn
with n = 1000, card(x) = 30
• m = 200 (random) noisy measurements: y = Ax + v, v ∼ N(0, σ2
I),
Aij ∼ N(0, 1)
• left: original; right: ℓ1 reconstruction with γ = 10−3
100 200 300 400 500 600 700 800 900 1000
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
100 200 300 400 500 600 700 800 900 1000
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
EE364b, Stanford University 27
30. Some recent theoretical results
• suppose y = Ax, A ∈ Rm×n
, card(x) ≤ k
• to reconstruct x, clearly need m ≥ k
• if m ≥ n and A is full rank, we can reconstruct x without cardinality
assumption
• when does the ℓ1 heuristic (minimizing kxk1 subject to Ax = y)
reconstruct x (exactly)?
EE364b, Stanford University 29
31. recent results by Candès, Donoho, Romberg, Tao, . . .
• (for some choices of A) if m ≥ (C log n)k, ℓ1 heuristic reconstructs x
exactly, with overwhelming probability
• C is absolute constant; valid A’s include
– Aij ∼ N(0, σ2
)
– Ax gives Fourier transform of x at m frequencies, chosen from
uniform distribution
EE364b, Stanford University 30