SlideShare a Scribd company logo
ℓ1-norm Methods for
Convex-Cardinality Problems
• problems involving cardinality
• the ℓ1-norm heuristic
• convex relaxation and convex envelope interpretations
• examples
• recent results
EE364b, Stanford University
ℓ1-norm heuristics for cardinality problems
• cardinality problems arise often, but are hard to solve exactly
• a simple heuristic, that relies on ℓ1-norm, seems to work well
• used for many years, in many fields
– sparse design
– LASSO, robust estimation in statistics
– support vector machine (SVM) in machine learning
– total variation reconstruction in signal processing, geophysics
– compressed sensing
• new theoretical results guarantee the method works, at least for a few
problems
EE364b, Stanford University 1
Cardinality
• the cardinality of x ∈ Rn
, denoted card(x), is the number of nonzero
components of x
• card is separable; for scalar x, card(x) =

0 x = 0
1 x 6= 0
• card is quasiconcave on Rn
+ (but not Rn
) since
card(x + y) ≥ min{card(x), card(y)}
holds for x, y  0
• but otherwise has no convexity properties
• arises in many problems
EE364b, Stanford University 2
General convex-cardinality problems
a convex-cardinality problem is one that would be convex, except for
appearance of card in objective or constraints
examples (with C, f convex):
• convex minimum cardinality problem:
minimize card(x)
subject to x ∈ C
• convex problem with cardinality constraint:
minimize f(x)
subject to x ∈ C, card(x) ≤ k
EE364b, Stanford University 3
Solving convex-cardinality problems
convex-cardinality problem with x ∈ Rn
• if we fix the sparsity pattern of x (i.e., which entries are zero/nonzero)
we get a convex problem
• by solving 2n
convex problems associated with all possible sparsity
patterns, we can solve convex-cardinality problem
(possibly practical for n ≤ 10; not practical for n  15 or so . . . )
• general convex-cardinality problem is (NP-) hard
• can solve globally by branch-and-bound
– can work for particular problem instances (with some luck)
– in worst case reduces to checking all (or many of) 2n
sparsity patterns
EE364b, Stanford University 4
Boolean LP as convex-cardinality problem
• Boolean LP:
minimize cT
x
subject to Ax  b, xi ∈ {0, 1}
includes many famous (hard) problems, e.g., 3-SAT, traveling salesman
• can be expressed as
minimize cT
x
subject to Ax  b, card(x) + card(1 − x) ≤ n
since card(x) + card(1 − x) ≤ n ⇐⇒ xi ∈ {0, 1}
• conclusion: general convex-cardinality problem is hard
EE364b, Stanford University 5
Sparse design
minimize card(x)
subject to x ∈ C
• find sparsest design vector x that satisfies a set of specifications
• zero values of x simplify design, or correspond to components that
aren’t even needed
• examples:
– FIR filter design (zero coefficients reduce required hardware)
– antenna array beamforming (zero coefficients correspond to unneeded
antenna elements)
– truss design (zero coefficients correspond to bars that are not needed)
– wire sizing (zero coefficients correspond to wires that are not needed)
EE364b, Stanford University 6
Sparse modeling / regressor selection
fit vector b ∈ Rm
as a linear combination of k regressors (chosen from n
possible regressors)
minimize kAx − bk2
subject to card(x) ≤ k
• gives k-term model
• chooses subset of k regressors that (together) best fit or explain b
• can solve (in principle) by trying all

n
k

choices
• variations:
– minimize card(x) subject to kAx − bk2 ≤ ǫ
– minimize kAx − bk2 + λ card(x)
EE364b, Stanford University 7
Sparse signal reconstruction
• estimate signal x, given
– noisy measurement y = Ax + v, v ∼ N(0, σ2
I) (A is known; v is not)
– prior information card(x) ≤ k
• maximum likelihood estimate x̂ml is solution of
minimize kAx − yk2
subject to card(x) ≤ k
EE364b, Stanford University 8
Estimation with outliers
• we have measurements yi = aT
i x + vi + wi, i = 1, . . . , m
• noises vi ∼ N(0, σ2
) are independent
• only assumption on w is sparsity: card(w) ≤ k
• B = {i | wi 6= 0} is set of bad measurements or outliers
• maximum likelihood estimate of x found by solving
minimize
P
i6∈B(yi − aT
i x)2
subject to |B| ≤ k
with variables x and B ⊆ {1, . . . , m}
• equivalent to
minimize ky − Ax − wk2
2
subject to card(w) ≤ k
EE364b, Stanford University 9
Minimum number of violations
• set of convex inequalities
f1(x) ≤ 0, . . . , fm(x) ≤ 0, x ∈ C
• choose x to minimize the number of violated inequalities:
minimize card(t)
subject to fi(x) ≤ ti, i = 1, . . . , m
x ∈ C, t ≥ 0
• determining whether zero inequalities can be violated is (easy) convex
feasibility problem
EE364b, Stanford University 10
Linear classifier with fewest errors
• given data (x1, y1), . . . , (xm, ym) ∈ Rn
× {−1, 1}
• we seek linear (affine) classifier y ≈ sign(wT
x + v)
• classification error corresponds to yi(wT
x + v) ≤ 0
• to find w, v that give fewest classification errors:
minimize card(t)
subject to yi(wT
xi + v) + ti ≥ 1, i = 1, . . . , m
with variables w, v, t (we use homogeneity in w, v here)
EE364b, Stanford University 11
Smallest set of mutually infeasible inequalities
• given a set of mutually infeasible convex inequalities
f1(x) ≤ 0, . . . , fm(x) ≤ 0
• find smallest (cardinality) subset of these that is infeasible
• certificate of infeasibility is g(λ) = infx(
Pm
i=1 λifi(x)) ≥ 1, λ  0
• to find smallest cardinality infeasible subset, we solve
minimize card(λ)
subject to g(λ) ≥ 1, λ  0
(assuming some constraint qualifications)
EE364b, Stanford University 12
Portfolio investment with linear and fixed costs
• we use budget B to purchase (dollar) amount xi ≥ 0 of stock i
• trading fee is fixed cost plus linear cost: β card(x) + αT
x
• budget constraint is 1T
x + β card(x) + αT
x ≤ B
• mean return on investment is µT
x; variance is xT
Σx
• minimize investment variance (risk) with mean return ≥ Rmin:
minimize xT
Σx
subject to µT
x ≥ Rmin, x  0
1T
x + β card(x) + αT
x ≤ B
EE364b, Stanford University 13
Piecewise constant fitting
• fit corrupted xcor by a piecewise constant signal x̂ with k or fewer jumps
• problem is convex once location (indices) of jumps are fixed
• x̂ is piecewise constant with ≤ k jumps ⇐⇒ card(Dx̂) ≤ k, where
D =




1 −1
1 −1
... ...
1 −1



 ∈ R(n−1)×n
• as convex-cardinality problem:
minimize kx̂ − xcork2
subject to card(Dx̂) ≤ k
EE364b, Stanford University 14
Piecewise linear fitting
• fit xcor by a piecewise linear signal x̂ with k or fewer kinks
• as convex-cardinality problem:
minimize kx̂ − xcork2
subject to card(∇2
x̂) ≤ k
where
∇2
=




−1 2 −1
−1 2 −1
... ... ...
−1 2 −1




EE364b, Stanford University 15
ℓ1-norm heuristic
• replace card(z) with γkzk1, or add regularization term γkzk1 to
objective
• γ  0 is parameter used to achieve desired sparsity
(when card appears in constraint, or as term in objective)
• more sophisticated versions use
P
i wi|zi| or
P
i wi(zi)+ +
P
i vi(zi)−,
where w, v are positive weights
EE364b, Stanford University 16
Example: Minimum cardinality problem
• start with (hard) minimum cardinality problem
minimize card(x)
subject to x ∈ C
(C convex)
• apply heuristic to get (easy) ℓ1-norm minimization problem
minimize kxk1
subject to x ∈ C
EE364b, Stanford University 17
Example: Cardinality constrained problem
• start with (hard) cardinality constrained problem (f, C convex)
minimize f(x)
subject to x ∈ C, card(x) ≤ k
• apply heuristic to get (easy) ℓ1-constrained problem
minimize f(x)
subject to x ∈ C, kxk1 ≤ β
or ℓ1-regularized problem
minimize f(x) + γkxk1
subject to x ∈ C
β, γ adjusted so that card(x) ≤ k
EE364b, Stanford University 18
Polishing
• use ℓ1 heuristic to find x̂ with required sparsity
• fix the sparsity pattern of x̂
• re-solve the (convex) optimization problem with this sparsity pattern to
obtain final (heuristic) solution
EE364b, Stanford University 19
Interpretation as convex relaxation
• start with
minimize card(x)
subject to x ∈ C, kxk∞ ≤ R
• equivalent to mixed Boolean convex problem
minimize 1T
z
subject to |xi| ≤ Rzi, i = 1, . . . , n
x ∈ C, zi ∈ {0, 1}, i = 1, . . . , n
with variables x, z
EE364b, Stanford University 20
• now relax zi ∈ {0, 1} to zi ∈ [0, 1] to obtain
minimize 1T
z
subject to |xi| ≤ Rzi, i = 1, . . . , n
x ∈ C
0 ≤ zi ≤ 1, i = 1, . . . , n
which is equivalent to
minimize (1/R)kxk1
subject to x ∈ C
kxk∞ ≤ R
the ℓ1 heuristic
• optimal value of this problem is lower bound on original problem
EE364b, Stanford University 21
Interpretation via convex envelope
• convex envelope fenv
of a function f on set C is the largest convex
function that is an underestimator of f on C
• epi(fenv
) = Co(epi(f))
• fenv
= (f∗
)∗
(with some technical conditions)
• for x scalar, |x| is the convex envelope of card(x) on [−1, 1]
• for x ∈ Rn
scalar, (1/R)kxk1 is convex envelope of card(x) on
{z | kzk∞ ≤ R}
EE364b, Stanford University 22
Weighted and asymmetric ℓ1 heuristics
• minimize card(x) over convex set C
• suppose we know lower and upper bounds on xi over C
x ∈ C =⇒ li ≤ xi ≤ ui
(best values for these can be found by solving 2n convex problems)
• if ui  0 or li  0, then card(xi) = 1 (i.e., xi 6= 0) for all x ∈ C
• assuming li  0, ui  0, convex relaxation and convex envelope
interpretations suggest using
n
X
i=1

(xi)+
ui
+
(xi)−
−li

as surrogate (and also lower bound) for card(x)
EE364b, Stanford University 23
Regressor selection
minimize kAx − bk2
subject to card(x) ≤ k
• heuristic:
– minimize kAx − bk2 + γkxk1
– find smallest value of γ that gives card(x) ≤ k
– fix associated sparsity pattern (i.e., subset of selected regressors) and
find x that minimizes kAx − bk2
EE364b, Stanford University 24
Example (6.4 in BV book)
• A ∈ R10×20
, x ∈ R20
, b ∈ R10
• dashed curve: exact optimal (via enumeration)
• solid curve: ℓ1 heuristic with polishing
0 1 2 3 4
0
2
4
6
8
10
kAx − bk2
card
(x)
EE364b, Stanford University 25
Sparse signal reconstruction
• convex-cardinality problem:
minimize kAx − yk2
subject to card(x) ≤ k
• ℓ1 heuristic:
minimize kAx − yk2
subject to kxk1 ≤ β
(called LASSO)
• another form: minimize kAx − yk2 + γkxk1
(called basis pursuit denoising)
EE364b, Stanford University 26
Example
• signal x ∈ Rn
with n = 1000, card(x) = 30
• m = 200 (random) noisy measurements: y = Ax + v, v ∼ N(0, σ2
I),
Aij ∼ N(0, 1)
• left: original; right: ℓ1 reconstruction with γ = 10−3
100 200 300 400 500 600 700 800 900 1000
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
100 200 300 400 500 600 700 800 900 1000
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
EE364b, Stanford University 27
• ℓ2 reconstruction; minimizes kAx − yk2 + γkxk2, where γ = 10−3
• left: original; right: ℓ2 reconstruction
100 200 300 400 500 600 700 800 900 1000
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
100 200 300 400 500 600 700 800 900 1000
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
EE364b, Stanford University 28
Some recent theoretical results
• suppose y = Ax, A ∈ Rm×n
, card(x) ≤ k
• to reconstruct x, clearly need m ≥ k
• if m ≥ n and A is full rank, we can reconstruct x without cardinality
assumption
• when does the ℓ1 heuristic (minimizing kxk1 subject to Ax = y)
reconstruct x (exactly)?
EE364b, Stanford University 29
recent results by Candès, Donoho, Romberg, Tao, . . .
• (for some choices of A) if m ≥ (C log n)k, ℓ1 heuristic reconstructs x
exactly, with overwhelming probability
• C is absolute constant; valid A’s include
– Aij ∼ N(0, σ2
)
– Ax gives Fourier transform of x at m frequencies, chosen from
uniform distribution
EE364b, Stanford University 30

More Related Content

PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...
PDF
1108.1170
PDF
Lecture5 kernel svm
PDF
Optimization
PDF
Gradient_Descent_Unconstrained.pdf
PDF
lec4_annotated.pdf ml csci 567 vatsal sharan
PDF
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
Maximum likelihood estimation of regularisation parameters in inverse problem...
QMC: Operator Splitting Workshop, Thresholdings, Robustness, and Generalized ...
1108.1170
Lecture5 kernel svm
Optimization
Gradient_Descent_Unconstrained.pdf
lec4_annotated.pdf ml csci 567 vatsal sharan
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...

Similar to Ordinality applied theory and its various aspects.pdf (20)

PPTX
Lecture_3_Gradient_Descent.pptx
PDF
subdiff_prox.pdf
PDF
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
PDF
Gradient descent method
PDF
Linear Programming
PPTX
Elements of Statistical Learning 読み会 第2章
PDF
talk_NASPDE.pdf
PPTX
Machine learning introduction lecture notes
PDF
1_1 Introduction to Regression and regularization.pdf
PDF
Recursive Compressed Sensing
PDF
lecture01_lecture01_lecture0001_ceva.pdf
PPTX
A machine learning method for efficient design optimization in nano-optics
PDF
Convex Hull Approximation of Nearly Optimal Lasso Solutions
PDF
lec3_annotated.pdf ml csci 567 vatsal sharan
PDF
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
PPTX
Optimization tutorial
PDF
bv_cvxslides (1).pdf
PPT
Introduction to Machine Learning STUDENTS.ppt
PDF
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
PDF
Optimum Engineering Design - Day 2b. Classical Optimization methods
Lecture_3_Gradient_Descent.pptx
subdiff_prox.pdf
Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitt...
Gradient descent method
Linear Programming
Elements of Statistical Learning 読み会 第2章
talk_NASPDE.pdf
Machine learning introduction lecture notes
1_1 Introduction to Regression and regularization.pdf
Recursive Compressed Sensing
lecture01_lecture01_lecture0001_ceva.pdf
A machine learning method for efficient design optimization in nano-optics
Convex Hull Approximation of Nearly Optimal Lasso Solutions
lec3_annotated.pdf ml csci 567 vatsal sharan
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
Optimization tutorial
bv_cvxslides (1).pdf
Introduction to Machine Learning STUDENTS.ppt
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
Optimum Engineering Design - Day 2b. Classical Optimization methods
Ad

Recently uploaded (20)

DOCX
Factor Analysis Word Document Presentation
PDF
[EN] Industrial Machine Downtime Prediction
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
annual-report-2024-2025 original latest.
PDF
Introduction to Data Science and Data Analysis
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
Predictive modeling basics in data cleaning process
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
Managing Community Partner Relationships
PPTX
Leprosy and NLEP programme community medicine
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Factor Analysis Word Document Presentation
[EN] Industrial Machine Downtime Prediction
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
annual-report-2024-2025 original latest.
Introduction to Data Science and Data Analysis
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Predictive modeling basics in data cleaning process
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
importance of Data-Visualization-in-Data-Science. for mba studnts
IBA_Chapter_11_Slides_Final_Accessible.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Managing Community Partner Relationships
Leprosy and NLEP programme community medicine
ISS -ESG Data flows What is ESG and HowHow
New ISO 27001_2022 standard and the changes
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Ad

Ordinality applied theory and its various aspects.pdf

  • 1. ℓ1-norm Methods for Convex-Cardinality Problems • problems involving cardinality • the ℓ1-norm heuristic • convex relaxation and convex envelope interpretations • examples • recent results EE364b, Stanford University
  • 2. ℓ1-norm heuristics for cardinality problems • cardinality problems arise often, but are hard to solve exactly • a simple heuristic, that relies on ℓ1-norm, seems to work well • used for many years, in many fields – sparse design – LASSO, robust estimation in statistics – support vector machine (SVM) in machine learning – total variation reconstruction in signal processing, geophysics – compressed sensing • new theoretical results guarantee the method works, at least for a few problems EE364b, Stanford University 1
  • 3. Cardinality • the cardinality of x ∈ Rn , denoted card(x), is the number of nonzero components of x • card is separable; for scalar x, card(x) = 0 x = 0 1 x 6= 0 • card is quasiconcave on Rn + (but not Rn ) since card(x + y) ≥ min{card(x), card(y)} holds for x, y 0 • but otherwise has no convexity properties • arises in many problems EE364b, Stanford University 2
  • 4. General convex-cardinality problems a convex-cardinality problem is one that would be convex, except for appearance of card in objective or constraints examples (with C, f convex): • convex minimum cardinality problem: minimize card(x) subject to x ∈ C • convex problem with cardinality constraint: minimize f(x) subject to x ∈ C, card(x) ≤ k EE364b, Stanford University 3
  • 5. Solving convex-cardinality problems convex-cardinality problem with x ∈ Rn • if we fix the sparsity pattern of x (i.e., which entries are zero/nonzero) we get a convex problem • by solving 2n convex problems associated with all possible sparsity patterns, we can solve convex-cardinality problem (possibly practical for n ≤ 10; not practical for n 15 or so . . . ) • general convex-cardinality problem is (NP-) hard • can solve globally by branch-and-bound – can work for particular problem instances (with some luck) – in worst case reduces to checking all (or many of) 2n sparsity patterns EE364b, Stanford University 4
  • 6. Boolean LP as convex-cardinality problem • Boolean LP: minimize cT x subject to Ax b, xi ∈ {0, 1} includes many famous (hard) problems, e.g., 3-SAT, traveling salesman • can be expressed as minimize cT x subject to Ax b, card(x) + card(1 − x) ≤ n since card(x) + card(1 − x) ≤ n ⇐⇒ xi ∈ {0, 1} • conclusion: general convex-cardinality problem is hard EE364b, Stanford University 5
  • 7. Sparse design minimize card(x) subject to x ∈ C • find sparsest design vector x that satisfies a set of specifications • zero values of x simplify design, or correspond to components that aren’t even needed • examples: – FIR filter design (zero coefficients reduce required hardware) – antenna array beamforming (zero coefficients correspond to unneeded antenna elements) – truss design (zero coefficients correspond to bars that are not needed) – wire sizing (zero coefficients correspond to wires that are not needed) EE364b, Stanford University 6
  • 8. Sparse modeling / regressor selection fit vector b ∈ Rm as a linear combination of k regressors (chosen from n possible regressors) minimize kAx − bk2 subject to card(x) ≤ k • gives k-term model • chooses subset of k regressors that (together) best fit or explain b • can solve (in principle) by trying all n k choices • variations: – minimize card(x) subject to kAx − bk2 ≤ ǫ – minimize kAx − bk2 + λ card(x) EE364b, Stanford University 7
  • 9. Sparse signal reconstruction • estimate signal x, given – noisy measurement y = Ax + v, v ∼ N(0, σ2 I) (A is known; v is not) – prior information card(x) ≤ k • maximum likelihood estimate x̂ml is solution of minimize kAx − yk2 subject to card(x) ≤ k EE364b, Stanford University 8
  • 10. Estimation with outliers • we have measurements yi = aT i x + vi + wi, i = 1, . . . , m • noises vi ∼ N(0, σ2 ) are independent • only assumption on w is sparsity: card(w) ≤ k • B = {i | wi 6= 0} is set of bad measurements or outliers • maximum likelihood estimate of x found by solving minimize P i6∈B(yi − aT i x)2 subject to |B| ≤ k with variables x and B ⊆ {1, . . . , m} • equivalent to minimize ky − Ax − wk2 2 subject to card(w) ≤ k EE364b, Stanford University 9
  • 11. Minimum number of violations • set of convex inequalities f1(x) ≤ 0, . . . , fm(x) ≤ 0, x ∈ C • choose x to minimize the number of violated inequalities: minimize card(t) subject to fi(x) ≤ ti, i = 1, . . . , m x ∈ C, t ≥ 0 • determining whether zero inequalities can be violated is (easy) convex feasibility problem EE364b, Stanford University 10
  • 12. Linear classifier with fewest errors • given data (x1, y1), . . . , (xm, ym) ∈ Rn × {−1, 1} • we seek linear (affine) classifier y ≈ sign(wT x + v) • classification error corresponds to yi(wT x + v) ≤ 0 • to find w, v that give fewest classification errors: minimize card(t) subject to yi(wT xi + v) + ti ≥ 1, i = 1, . . . , m with variables w, v, t (we use homogeneity in w, v here) EE364b, Stanford University 11
  • 13. Smallest set of mutually infeasible inequalities • given a set of mutually infeasible convex inequalities f1(x) ≤ 0, . . . , fm(x) ≤ 0 • find smallest (cardinality) subset of these that is infeasible • certificate of infeasibility is g(λ) = infx( Pm i=1 λifi(x)) ≥ 1, λ 0 • to find smallest cardinality infeasible subset, we solve minimize card(λ) subject to g(λ) ≥ 1, λ 0 (assuming some constraint qualifications) EE364b, Stanford University 12
  • 14. Portfolio investment with linear and fixed costs • we use budget B to purchase (dollar) amount xi ≥ 0 of stock i • trading fee is fixed cost plus linear cost: β card(x) + αT x • budget constraint is 1T x + β card(x) + αT x ≤ B • mean return on investment is µT x; variance is xT Σx • minimize investment variance (risk) with mean return ≥ Rmin: minimize xT Σx subject to µT x ≥ Rmin, x 0 1T x + β card(x) + αT x ≤ B EE364b, Stanford University 13
  • 15. Piecewise constant fitting • fit corrupted xcor by a piecewise constant signal x̂ with k or fewer jumps • problem is convex once location (indices) of jumps are fixed • x̂ is piecewise constant with ≤ k jumps ⇐⇒ card(Dx̂) ≤ k, where D =     1 −1 1 −1 ... ... 1 −1     ∈ R(n−1)×n • as convex-cardinality problem: minimize kx̂ − xcork2 subject to card(Dx̂) ≤ k EE364b, Stanford University 14
  • 16. Piecewise linear fitting • fit xcor by a piecewise linear signal x̂ with k or fewer kinks • as convex-cardinality problem: minimize kx̂ − xcork2 subject to card(∇2 x̂) ≤ k where ∇2 =     −1 2 −1 −1 2 −1 ... ... ... −1 2 −1     EE364b, Stanford University 15
  • 17. ℓ1-norm heuristic • replace card(z) with γkzk1, or add regularization term γkzk1 to objective • γ 0 is parameter used to achieve desired sparsity (when card appears in constraint, or as term in objective) • more sophisticated versions use P i wi|zi| or P i wi(zi)+ + P i vi(zi)−, where w, v are positive weights EE364b, Stanford University 16
  • 18. Example: Minimum cardinality problem • start with (hard) minimum cardinality problem minimize card(x) subject to x ∈ C (C convex) • apply heuristic to get (easy) ℓ1-norm minimization problem minimize kxk1 subject to x ∈ C EE364b, Stanford University 17
  • 19. Example: Cardinality constrained problem • start with (hard) cardinality constrained problem (f, C convex) minimize f(x) subject to x ∈ C, card(x) ≤ k • apply heuristic to get (easy) ℓ1-constrained problem minimize f(x) subject to x ∈ C, kxk1 ≤ β or ℓ1-regularized problem minimize f(x) + γkxk1 subject to x ∈ C β, γ adjusted so that card(x) ≤ k EE364b, Stanford University 18
  • 20. Polishing • use ℓ1 heuristic to find x̂ with required sparsity • fix the sparsity pattern of x̂ • re-solve the (convex) optimization problem with this sparsity pattern to obtain final (heuristic) solution EE364b, Stanford University 19
  • 21. Interpretation as convex relaxation • start with minimize card(x) subject to x ∈ C, kxk∞ ≤ R • equivalent to mixed Boolean convex problem minimize 1T z subject to |xi| ≤ Rzi, i = 1, . . . , n x ∈ C, zi ∈ {0, 1}, i = 1, . . . , n with variables x, z EE364b, Stanford University 20
  • 22. • now relax zi ∈ {0, 1} to zi ∈ [0, 1] to obtain minimize 1T z subject to |xi| ≤ Rzi, i = 1, . . . , n x ∈ C 0 ≤ zi ≤ 1, i = 1, . . . , n which is equivalent to minimize (1/R)kxk1 subject to x ∈ C kxk∞ ≤ R the ℓ1 heuristic • optimal value of this problem is lower bound on original problem EE364b, Stanford University 21
  • 23. Interpretation via convex envelope • convex envelope fenv of a function f on set C is the largest convex function that is an underestimator of f on C • epi(fenv ) = Co(epi(f)) • fenv = (f∗ )∗ (with some technical conditions) • for x scalar, |x| is the convex envelope of card(x) on [−1, 1] • for x ∈ Rn scalar, (1/R)kxk1 is convex envelope of card(x) on {z | kzk∞ ≤ R} EE364b, Stanford University 22
  • 24. Weighted and asymmetric ℓ1 heuristics • minimize card(x) over convex set C • suppose we know lower and upper bounds on xi over C x ∈ C =⇒ li ≤ xi ≤ ui (best values for these can be found by solving 2n convex problems) • if ui 0 or li 0, then card(xi) = 1 (i.e., xi 6= 0) for all x ∈ C • assuming li 0, ui 0, convex relaxation and convex envelope interpretations suggest using n X i=1 (xi)+ ui + (xi)− −li as surrogate (and also lower bound) for card(x) EE364b, Stanford University 23
  • 25. Regressor selection minimize kAx − bk2 subject to card(x) ≤ k • heuristic: – minimize kAx − bk2 + γkxk1 – find smallest value of γ that gives card(x) ≤ k – fix associated sparsity pattern (i.e., subset of selected regressors) and find x that minimizes kAx − bk2 EE364b, Stanford University 24
  • 26. Example (6.4 in BV book) • A ∈ R10×20 , x ∈ R20 , b ∈ R10 • dashed curve: exact optimal (via enumeration) • solid curve: ℓ1 heuristic with polishing 0 1 2 3 4 0 2 4 6 8 10 kAx − bk2 card (x) EE364b, Stanford University 25
  • 27. Sparse signal reconstruction • convex-cardinality problem: minimize kAx − yk2 subject to card(x) ≤ k • ℓ1 heuristic: minimize kAx − yk2 subject to kxk1 ≤ β (called LASSO) • another form: minimize kAx − yk2 + γkxk1 (called basis pursuit denoising) EE364b, Stanford University 26
  • 28. Example • signal x ∈ Rn with n = 1000, card(x) = 30 • m = 200 (random) noisy measurements: y = Ax + v, v ∼ N(0, σ2 I), Aij ∼ N(0, 1) • left: original; right: ℓ1 reconstruction with γ = 10−3 100 200 300 400 500 600 700 800 900 1000 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 100 200 300 400 500 600 700 800 900 1000 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 EE364b, Stanford University 27
  • 29. • ℓ2 reconstruction; minimizes kAx − yk2 + γkxk2, where γ = 10−3 • left: original; right: ℓ2 reconstruction 100 200 300 400 500 600 700 800 900 1000 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 100 200 300 400 500 600 700 800 900 1000 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 EE364b, Stanford University 28
  • 30. Some recent theoretical results • suppose y = Ax, A ∈ Rm×n , card(x) ≤ k • to reconstruct x, clearly need m ≥ k • if m ≥ n and A is full rank, we can reconstruct x without cardinality assumption • when does the ℓ1 heuristic (minimizing kxk1 subject to Ax = y) reconstruct x (exactly)? EE364b, Stanford University 29
  • 31. recent results by Candès, Donoho, Romberg, Tao, . . . • (for some choices of A) if m ≥ (C log n)k, ℓ1 heuristic reconstructs x exactly, with overwhelming probability • C is absolute constant; valid A’s include – Aij ∼ N(0, σ2 ) – Ax gives Fourier transform of x at m frequencies, chosen from uniform distribution EE364b, Stanford University 30