SlideShare a Scribd company logo
A Splitting Method for Nonsmooth Nonconvex Problems.
Peng Zheng and Aleksandr Aravkin1
1
Applied Math & eSciences, UW
0 / 21
Problem Class
We are interested in optimizing problems of the form
h(Ax) + g(x)
• h is a nonsmooth, nonconvex separable function
• A is a linear map for now; in general, we need F(x)
• g(x) is a convex regularizer
We see these problems at the bleeding edge of very interesting applications:
• Optics (e.g. phase retrieval)
• Radiation therapy (nonconvex constraints)
• Chemistry (predicting structures of chromosomes/peptides/proteins)
1 / 21
Applications in this talk
• Exact Phase Retrieval:
min
x
|Ax| − b 1, x ∈ Cn
.
• Semi-Supervised SVMs:
min
ξ,β
λ
2
ξ 2
H +
s
i=1
[1 − bili(ξ, β)]+ + τ
m
i=s+1
[1 − |li(ξ, β)|]+.
• Stochastic Shortest Path:
min
x
d
i=1
| min{ u1
i , x + v1
i − xi, u2
i , x + v2
i − xi}|.
• Exact Robust PCA:
min
L,R
D − LR 1.
2 / 21
A Possible Approach
Define
f(x) := h(Ax) + g(x)
and consider the relaxed objective
min
x,w
fν (x, w) := h(w) +
1
2ν
Ax − w 2
+ g(x)
If we partially minimize in w we effectively replace h with its Moreau envelope:
min
x
hν (Ax) + g(x), hν (Ax) := min
w
1
2
w − Ax 2
+ h(w).
3 / 21
Simple Nonconvex Example
Consider a 1D phase retrieval objective function and its relaxed version,
p(x) = ||x| − 1|, pν (x) = min
y
1
2ν
(y − x)2
+ ||y| − 1|.
−4 −2 0 2 4
x
0.0
0.5
1.0
1.5
2.0
2.5
3.0 original
ν = 0.5
ν = 1
ν = 10
4 / 21
A Better Approach
Define
f(x) := h(Ax) + g(x)
and consider the relaxed objective
min
x,w
fν (x, w) := h(w) +
1
2ν
Ax − w 2
+ g(x)
Partially minimize in x instead:
min
w
h(w) + gν (w), gν (w) := min
x
1
2
Ax − w 2
+ g(x).
5 / 21
Algorithm
A simple algorithm: Zheng and Aravkin (2018).
Algorithm 1 Proximal Gradient Descent for h(w) + gν (w)
1: Input: w0
2: Initialize: k = 0
3: while not converged do
4: wk+1 ← arg minw h(w) + 1
2ν
w − Axk
2
5: xk+1 ← arg minx g(x) + 1
2ν
Ax − wk+1
2
6: k ← k + 1
7: Output: wk
Algorithm requires only two ingredients:
1. Oracle for x update
2. Prox operator for h
6 / 21
Critical Points and Optimality Condition
Definition (Critical Points and Optimality Condition)
A point (¯x, ¯w) ∈ Rd
× Rm
is a critical point for fν if it satisfies the inclusion,
0 ∈ ∂h( ¯w) + 1
ν
( ¯w − A¯x),
0 ∈ ∂g(¯x) + 1
ν
AT
(A¯x − ¯w),
where ∂h, ∂g are limiting subdifferentials of h, g (Rockafellar and Wets (1998))
Define also
Tν (x, w) = min{ v 2
+ u 2
:
v ∈ ∂h(w) + 1
ν
(w − Ax),
u ∈ ∂g(x) + 1
ν
AT
(Ax − w)}.
7 / 21
Summary of Convergence Results
h(w) + gν (w)
A1 T
k
ν ≤ 2
νk
[fν (x0, w0) − f∗
]
A2 fν (xk, wk) − f∗
ν ≤ w0−w∗ 2
2ν(k+1)
A3 wk+1 − w∗ 2
≤ 1
1+αν
wk − w∗ 2
A4 wk+1 − w∗
≤ 1
αν
wk − w∗ 2
Assumption
A1 h is prox-bounded, g is closed and convex.
A2 h and g are both proper closed convex functions.
A3 h is α-strongly convex and g = 0.
A4 h is proper closed convex, g = 0 and there exist a sharp minimum of fν .
8 / 21
Sharpness
Definition
The minimizer w∗
of p is sharp if there exist δ, α > 0, so that for any
w ∈ {w : w − w∗
≤ δ},
p(w) − p(w∗
) ≥ α w − w∗
.
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
9 / 21
Results for Phase Retrieval
min
x,w∈Cn
fν (x, w) := |w| − b 1 +
1
2ν
Ax − w 2
0 10 20
iterations
10−20
10−14
10−8
10−2
104
f(x,w)−f∗
objective value
0 10 20
iterations
10−4
10−2
100
102
wk−wk−1
optimality condition
Figure: Convergence history for large scale phase retrieval.
10 / 21
Results for Phase Retrieval
Figure: Large example (d = 3 × 222
, n = 222
, m = 3n). Original picture (left), initial point
(middle), and final result (right)
11 / 21
Comparison with Other Methods
we compare Algorithm 1 with several other popular methods studied by Duchi
and Ruan (2017); Davis et al. (2017).
objective n d m # FHT
Alg 1 |Ax| − b 1 2048 × 2048 3 × 222
m = 3d 518
Alg 22
(Ax)2
− b 1 2048 × 2048 3 × 222
m = 3d 1530
Alg 33
(Ax)2
− b 1 1024 × 1024 3 × 220
m = 3d 15100
Table: Comparison summary. n represents the size of the pictures, d is the dimension of the
vectorized picture, and m is the number of measurements. FHT is a fast Hadamard transform.
The counts of FHT include initialization.
• Get a simple fast method (in terms of FHT) for phase retrieval
• Requires prox of piecewise linear function, and matrix-vector multiplication.
2
Davis et al. (2017)
3
Duchi and Ruan (2017)
12 / 21
Semi-Supervised Learning
Finite dimensional Kernel SVM formulation:
min
x,β
λ
2
x 2
K +
s
i=1
[1 − bili(x, β)]+ + τ
m
i=s+1
[1 − |li(x, β)|]+
hτ (Kx+β1)
• li(x, β) = φ(A)x, φ(ai) H + β
• K = { φ(ai), φ(aj) H} is the kernel matrix.
• |li(x, β)| are used for unlabeled examples i ∈ {s + 1, . . . , m}.
Relaxed objective:
min
w,x,β
hτ (w) +
1
2ν
Kx + β1 − w 2
+
λ
2
x 2
K
g(x)
13 / 21
Results
0 50 100 150
iterations
10−11
10−8
10−5
10−2
101
f(x,w)−f∗
objective value
0 50 100 150
iterations
10−5
10−3
10−1
101
wk−wk−1
optimality condition
10−6
10−5
10−4
10−3
10−2
10−1
100
101
τ
0.024
0.026
0.028
0.030
0.032
percentageofmiss-fits
training and test errors for different τ
training error
testing error
• First method (that we have seen) for semi-supervised Kernel machines.
• Requires least squares problems over the Kernel matrix, and prox operator.
14 / 21
Stochastic Shortest Path
Figure: Given two action graphs, we want to move from A to B. At each node we can switch between black and
red graph depend on the expected cost, and take available edges uniformly at random.
15 / 21
Stochastic Shortest Path
Using the Bellman equation, the problem is formulated as,
min
x∈Rd
d
i=1
min U1
i·, x + v1
i − xi, U2
i·, x + v2
i − xi .
• Uk
is the connectivity matrix for graph k
• vk
i is the expected cost of leaving node i in graph k
Relaxed objective:
min
x,w1,w2
h(w1
, w2
) +
1
2ν
A1
x − w1 2
+ A2
x − w2
,
where Ak
= Uk
− I, and h(w1
, w2
) =
d
i=1
| min{w1
i + v1
i , w2
i + v2
i }|.
16 / 21
Result
0 200 400 600
iterations
10−15
10−12
10−9
10−6
10−3
f(x,w)−f∗
objective value
0 200 400 600
iterations
10−7
10−5
10−3
10−1
optimality condition
• Optimal value is 0, so we know we solve the original problem.
• Previous methods use subgradients; we just need LS and prox.
17 / 21
Open Problems
• What happens with nonlinear nonconvex composite models?
min
x
h(F(x)) + g(x)
• How to do ν continuation?
min
x,w
h(w) +
1
2ν
Ax − w 2
+ g(x)
• What’s the best way to proceed for large-scale problems, where we have to
solve the x-subproblem inexactly?
18 / 21
Empirical Result for Problem 1:
Exact robust PCA:
min
L,R
D − LR 1 , L ∈ Rm×k
, R ∈ Rk×n
.
Relaxed objective:
min
L,R,W
D − W 1 +
1
2ν
W − LR 2
F .
Even though the problem is technically in the F(x) class, for x = (L, R), it is
special: SVD gives a closed form solution to the L, R subproblem.
19 / 21
Result
0 5 10
iterations
10−11
10−8
10−5
10−2
101
f(x,w)−f∗
objective value
0 5 10
iterations
10−5
10−3
10−1
101
optimality condition
20 / 21
Reference I
Davis, D., Drusvyatskiy, D., and Paquette, C. (2017). The nonsmooth
landscape of phase retrieval. arXiv preprint arXiv:1711.03247.
Duchi, J. C. and Ruan, F. (2017). Solving (most) of a set of quadratic
equalities: Composite optimization for robust phase retrieval. arXiv preprint
arXiv:1705.02356.
Rockafellar, R. and Wets, R.-B. (1998). Variational Analysis. Grundlehren der
mathematischen Wissenschaften, Vol 317, Springer, Berlin.
Zheng, P. and Aravkin, A. Y. (2018). Fast methods for nonsmooth nonconvex
minimization. arXiv preprint arXiv:1802.02654.
21 / 21

More Related Content

PDF
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
PDF
cyclic_code.pdf
PDF
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Applied numerical methods lec12
PDF
Lecture 12 f17
PDF
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
PDF
talk MCMC & SMC 2004
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
cyclic_code.pdf
QMC: Operator Splitting Workshop, A New (More Intuitive?) Interpretation of I...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Applied numerical methods lec12
Lecture 12 f17
CLIM Fall 2017 Course: Statistics for Climate Research, Estimating Curves and...
talk MCMC & SMC 2004

What's hot (18)

PDF
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Solving the energy problem of helium final report
PDF
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
PDF
Solovay Kitaev theorem
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
accurate ABC Oliver Ratmann
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
(α ψ)- Construction with q- function for coupled fixed point
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
02 basics i-handout
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
PDF
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
PDF
Patch Matching with Polynomial Exponential Families and Projective Divergences
PDF
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Solving the energy problem of helium final report
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
Solovay Kitaev theorem
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
accurate ABC Oliver Ratmann
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
(α ψ)- Construction with q- function for coupled fixed point
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
02 basics i-handout
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
Patch Matching with Polynomial Exponential Families and Projective Divergences
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
Ad

Similar to QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex Problems - Aleksandr Aravkin, Mar 21, 2018 (20)

PDF
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
PDF
Random Matrix Theory and Machine Learning - Part 3
PDF
lec5_annotated.pdf ml csci 567 vatsal sharan
PDF
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
PDF
Nonlinear_system,Nonlinear_system, Nonlinear_system.pdf
PDF
bv_cvxslides (1).pdf
PDF
Optimization Methods for Machine Learning and Engineering: Optimization in Ve...
PPTX
Physical Chemistry Assignment Help
PDF
Solution to schrodinger equation with dirac comb potential
PPT
lecture9-support vector machines algorithms_ML-1.ppt
PDF
The Gaussian Hardy-Littlewood Maximal Function
PDF
A Simple Review on SVM
PDF
course slides of Support-Vector-Machine.pdf
PDF
cswiercz-general-presentation
PDF
AJMS_389_22.pdf
PDF
Epsrcws08 campbell isvm_01
PPT
Convex Optimization Modelling with CVXOPT
PPTX
Support vector machine
PPTX
Support vector machines
PDF
Gaussian quadratures
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Random Matrix Theory and Machine Learning - Part 3
lec5_annotated.pdf ml csci 567 vatsal sharan
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Nonlinear_system,Nonlinear_system, Nonlinear_system.pdf
bv_cvxslides (1).pdf
Optimization Methods for Machine Learning and Engineering: Optimization in Ve...
Physical Chemistry Assignment Help
Solution to schrodinger equation with dirac comb potential
lecture9-support vector machines algorithms_ML-1.ppt
The Gaussian Hardy-Littlewood Maximal Function
A Simple Review on SVM
course slides of Support-Vector-Machine.pdf
cswiercz-general-presentation
AJMS_389_22.pdf
Epsrcws08 campbell isvm_01
Convex Optimization Modelling with CVXOPT
Support vector machine
Support vector machines
Gaussian quadratures
Ad

More from The Statistical and Applied Mathematical Sciences Institute (20)

PDF
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
PDF
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
PDF
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
PDF
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
PDF
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
PDF
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
PPTX
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
PDF
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
PDF
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
PPTX
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
PDF
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
PDF
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
PDF
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
PDF
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
PDF
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
PDF
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
PPTX
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
PPTX
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
PDF
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
PDF
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...

Recently uploaded (20)

PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Cell Structure & Organelles in detailed.
PPTX
Institutional Correction lecture only . . .
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
human mycosis Human fungal infections are called human mycosis..pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Microbial diseases, their pathogenesis and prophylaxis
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Insiders guide to clinical Medicine.pdf
VCE English Exam - Section C Student Revision Booklet
102 student loan defaulters named and shamed – Is someone you know on the list?
Anesthesia in Laparoscopic Surgery in India
O7-L3 Supply Chain Operations - ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Cell Structure & Organelles in detailed.
Institutional Correction lecture only . . .
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
2.FourierTransform-ShortQuestionswithAnswers.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Module 4: Burden of Disease Tutorial Slides S2 2025
Renaissance Architecture: A Journey from Faith to Humanism

QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex Problems - Aleksandr Aravkin, Mar 21, 2018

  • 1. A Splitting Method for Nonsmooth Nonconvex Problems. Peng Zheng and Aleksandr Aravkin1 1 Applied Math & eSciences, UW 0 / 21
  • 2. Problem Class We are interested in optimizing problems of the form h(Ax) + g(x) • h is a nonsmooth, nonconvex separable function • A is a linear map for now; in general, we need F(x) • g(x) is a convex regularizer We see these problems at the bleeding edge of very interesting applications: • Optics (e.g. phase retrieval) • Radiation therapy (nonconvex constraints) • Chemistry (predicting structures of chromosomes/peptides/proteins) 1 / 21
  • 3. Applications in this talk • Exact Phase Retrieval: min x |Ax| − b 1, x ∈ Cn . • Semi-Supervised SVMs: min ξ,β λ 2 ξ 2 H + s i=1 [1 − bili(ξ, β)]+ + τ m i=s+1 [1 − |li(ξ, β)|]+. • Stochastic Shortest Path: min x d i=1 | min{ u1 i , x + v1 i − xi, u2 i , x + v2 i − xi}|. • Exact Robust PCA: min L,R D − LR 1. 2 / 21
  • 4. A Possible Approach Define f(x) := h(Ax) + g(x) and consider the relaxed objective min x,w fν (x, w) := h(w) + 1 2ν Ax − w 2 + g(x) If we partially minimize in w we effectively replace h with its Moreau envelope: min x hν (Ax) + g(x), hν (Ax) := min w 1 2 w − Ax 2 + h(w). 3 / 21
  • 5. Simple Nonconvex Example Consider a 1D phase retrieval objective function and its relaxed version, p(x) = ||x| − 1|, pν (x) = min y 1 2ν (y − x)2 + ||y| − 1|. −4 −2 0 2 4 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 original ν = 0.5 ν = 1 ν = 10 4 / 21
  • 6. A Better Approach Define f(x) := h(Ax) + g(x) and consider the relaxed objective min x,w fν (x, w) := h(w) + 1 2ν Ax − w 2 + g(x) Partially minimize in x instead: min w h(w) + gν (w), gν (w) := min x 1 2 Ax − w 2 + g(x). 5 / 21
  • 7. Algorithm A simple algorithm: Zheng and Aravkin (2018). Algorithm 1 Proximal Gradient Descent for h(w) + gν (w) 1: Input: w0 2: Initialize: k = 0 3: while not converged do 4: wk+1 ← arg minw h(w) + 1 2ν w − Axk 2 5: xk+1 ← arg minx g(x) + 1 2ν Ax − wk+1 2 6: k ← k + 1 7: Output: wk Algorithm requires only two ingredients: 1. Oracle for x update 2. Prox operator for h 6 / 21
  • 8. Critical Points and Optimality Condition Definition (Critical Points and Optimality Condition) A point (¯x, ¯w) ∈ Rd × Rm is a critical point for fν if it satisfies the inclusion, 0 ∈ ∂h( ¯w) + 1 ν ( ¯w − A¯x), 0 ∈ ∂g(¯x) + 1 ν AT (A¯x − ¯w), where ∂h, ∂g are limiting subdifferentials of h, g (Rockafellar and Wets (1998)) Define also Tν (x, w) = min{ v 2 + u 2 : v ∈ ∂h(w) + 1 ν (w − Ax), u ∈ ∂g(x) + 1 ν AT (Ax − w)}. 7 / 21
  • 9. Summary of Convergence Results h(w) + gν (w) A1 T k ν ≤ 2 νk [fν (x0, w0) − f∗ ] A2 fν (xk, wk) − f∗ ν ≤ w0−w∗ 2 2ν(k+1) A3 wk+1 − w∗ 2 ≤ 1 1+αν wk − w∗ 2 A4 wk+1 − w∗ ≤ 1 αν wk − w∗ 2 Assumption A1 h is prox-bounded, g is closed and convex. A2 h and g are both proper closed convex functions. A3 h is α-strongly convex and g = 0. A4 h is proper closed convex, g = 0 and there exist a sharp minimum of fν . 8 / 21
  • 10. Sharpness Definition The minimizer w∗ of p is sharp if there exist δ, α > 0, so that for any w ∈ {w : w − w∗ ≤ δ}, p(w) − p(w∗ ) ≥ α w − w∗ . 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 9 / 21
  • 11. Results for Phase Retrieval min x,w∈Cn fν (x, w) := |w| − b 1 + 1 2ν Ax − w 2 0 10 20 iterations 10−20 10−14 10−8 10−2 104 f(x,w)−f∗ objective value 0 10 20 iterations 10−4 10−2 100 102 wk−wk−1 optimality condition Figure: Convergence history for large scale phase retrieval. 10 / 21
  • 12. Results for Phase Retrieval Figure: Large example (d = 3 × 222 , n = 222 , m = 3n). Original picture (left), initial point (middle), and final result (right) 11 / 21
  • 13. Comparison with Other Methods we compare Algorithm 1 with several other popular methods studied by Duchi and Ruan (2017); Davis et al. (2017). objective n d m # FHT Alg 1 |Ax| − b 1 2048 × 2048 3 × 222 m = 3d 518 Alg 22 (Ax)2 − b 1 2048 × 2048 3 × 222 m = 3d 1530 Alg 33 (Ax)2 − b 1 1024 × 1024 3 × 220 m = 3d 15100 Table: Comparison summary. n represents the size of the pictures, d is the dimension of the vectorized picture, and m is the number of measurements. FHT is a fast Hadamard transform. The counts of FHT include initialization. • Get a simple fast method (in terms of FHT) for phase retrieval • Requires prox of piecewise linear function, and matrix-vector multiplication. 2 Davis et al. (2017) 3 Duchi and Ruan (2017) 12 / 21
  • 14. Semi-Supervised Learning Finite dimensional Kernel SVM formulation: min x,β λ 2 x 2 K + s i=1 [1 − bili(x, β)]+ + τ m i=s+1 [1 − |li(x, β)|]+ hτ (Kx+β1) • li(x, β) = φ(A)x, φ(ai) H + β • K = { φ(ai), φ(aj) H} is the kernel matrix. • |li(x, β)| are used for unlabeled examples i ∈ {s + 1, . . . , m}. Relaxed objective: min w,x,β hτ (w) + 1 2ν Kx + β1 − w 2 + λ 2 x 2 K g(x) 13 / 21
  • 15. Results 0 50 100 150 iterations 10−11 10−8 10−5 10−2 101 f(x,w)−f∗ objective value 0 50 100 150 iterations 10−5 10−3 10−1 101 wk−wk−1 optimality condition 10−6 10−5 10−4 10−3 10−2 10−1 100 101 τ 0.024 0.026 0.028 0.030 0.032 percentageofmiss-fits training and test errors for different τ training error testing error • First method (that we have seen) for semi-supervised Kernel machines. • Requires least squares problems over the Kernel matrix, and prox operator. 14 / 21
  • 16. Stochastic Shortest Path Figure: Given two action graphs, we want to move from A to B. At each node we can switch between black and red graph depend on the expected cost, and take available edges uniformly at random. 15 / 21
  • 17. Stochastic Shortest Path Using the Bellman equation, the problem is formulated as, min x∈Rd d i=1 min U1 i·, x + v1 i − xi, U2 i·, x + v2 i − xi . • Uk is the connectivity matrix for graph k • vk i is the expected cost of leaving node i in graph k Relaxed objective: min x,w1,w2 h(w1 , w2 ) + 1 2ν A1 x − w1 2 + A2 x − w2 , where Ak = Uk − I, and h(w1 , w2 ) = d i=1 | min{w1 i + v1 i , w2 i + v2 i }|. 16 / 21
  • 18. Result 0 200 400 600 iterations 10−15 10−12 10−9 10−6 10−3 f(x,w)−f∗ objective value 0 200 400 600 iterations 10−7 10−5 10−3 10−1 optimality condition • Optimal value is 0, so we know we solve the original problem. • Previous methods use subgradients; we just need LS and prox. 17 / 21
  • 19. Open Problems • What happens with nonlinear nonconvex composite models? min x h(F(x)) + g(x) • How to do ν continuation? min x,w h(w) + 1 2ν Ax − w 2 + g(x) • What’s the best way to proceed for large-scale problems, where we have to solve the x-subproblem inexactly? 18 / 21
  • 20. Empirical Result for Problem 1: Exact robust PCA: min L,R D − LR 1 , L ∈ Rm×k , R ∈ Rk×n . Relaxed objective: min L,R,W D − W 1 + 1 2ν W − LR 2 F . Even though the problem is technically in the F(x) class, for x = (L, R), it is special: SVD gives a closed form solution to the L, R subproblem. 19 / 21
  • 21. Result 0 5 10 iterations 10−11 10−8 10−5 10−2 101 f(x,w)−f∗ objective value 0 5 10 iterations 10−5 10−3 10−1 101 optimality condition 20 / 21
  • 22. Reference I Davis, D., Drusvyatskiy, D., and Paquette, C. (2017). The nonsmooth landscape of phase retrieval. arXiv preprint arXiv:1711.03247. Duchi, J. C. and Ruan, F. (2017). Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. arXiv preprint arXiv:1705.02356. Rockafellar, R. and Wets, R.-B. (1998). Variational Analysis. Grundlehren der mathematischen Wissenschaften, Vol 317, Springer, Berlin. Zheng, P. and Aravkin, A. Y. (2018). Fast methods for nonsmooth nonconvex minimization. arXiv preprint arXiv:1802.02654. 21 / 21