Dominance-Based Pareto-Surrogate for Multi-Objective Optimization

Dominance-Based Pareto-Surrogate for
Multi-Objective Optimization
Ilya Loshchilov1,2
, Marc Schoenauer1,2
, Michèle Sebag2,1
1
TAO Project-team, INRIA Saclay - Île-de-France
2
and Laboratoire de Recherche en Informatique (UMR CNRS 8623)
Université Paris-Sud, 91128 Orsay Cedex, France
Simulated Evolution And Learning (SEAL-2010)
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Dominance-Based Pareto-Surrogate for Multi-Objective Optimization 1/ 1

Multi-objective CMA-ES (MO-CMA-ES)
MO-CMA-ES = µmo independent (1+1)-CMA-ES.
Each (1+1)-CMA samples new offspring. The size of the
temporary population is 2µmo.
Only µmo best solutions should be chosen for new population
after the hypervolume-based non-dominated sorting.
Update of CMA individuals takes place.
Objective 1
Objective2
Dominated
Pareto

Global Surrogate Model
Goal: find the function F(x) which defines the aggregated quality
of the solution x in multi-objective case.
Idea: use F(x) for optimization or filtering to find new prospective
solutions.
An efficient SVM-based approach has been recently proposed. 1
FSVM
Objective 1
Objective2
p-e
p+e
Dominated
Pareto
New Pareto
X1
X2
p+e
p-e
2e
1I. Loshchilov, M. Schoenauer, M. Sebag (GECCO 2010). "A Mono Surrogate for Multiobjective Optimization"

SVM-informed EMOA: Filtering
Generate Ninform pre-children
For each pre-children A and the nearest parent B calculate
Gain(A, B) = Fsvm(A) − Fsvm(B)
New children is the point with the maximum value of Gain
X1
X2
true Pareto
SVM Pareto

Support Vector Machine for Classiﬁcation
Linear Classiﬁer
L1
2
3
L
L
b
w
w
w2/|| ||
-1
b
b
+1
0
b
xi
support
vector
Main Idea
Training Data:
D = {(xi, yi)|xi ∈ IRp
, yi ∈ {−1, +1}}n
i=1
w, xi ≥ b + ǫ ⇒ yi = +1;
w, xi ≤ b − ǫ ⇒ yi = −1;
Dividing by ǫ > 0:
w, xi − b ≥ +1 ⇒ yi = +1;
w, xi − b ≤ −1 ⇒ yi = −1;
Optimization Problem: Primal Form
Minimize{w, ξ}
1
2 ||w||2
+ C
n
i=1 ξi
subject to: yi( w, xi − b) ≥ 1 − ξi, ξi ≥ 0

Linear Classiﬁer
L1
2
3
L
L
b
w
w
w2/|| ||
-1
b
b
+1
0
b
xi
support
vector
Optimization Problem: Dual Form
From Lagrange Theorem, instead of minimize F:
Minimize{α,G}F − i αiGi
subject to: αi ≥ 0, Gi ≥ 0
Leaving the details, Dual form:
Maximize{α}
n
i αi − 1
2
n
i,j=1 αiαjyiyj xi, xj
subject to: 0 ≤ αi ≤ C, n
i αiyi = 0
Properties
Decision Function:
F(x) = sign(
n
i αiyi xi, x − b)
The Dual form may be solved using standard
quadratic programming solver.

Non-Linear Classiﬁer
ww, (x)F -b =+1
w, (x)F -b = -1
w2/|| ||
a) b) c)
support vector
xi
F
Non-linear classiﬁcation with the "Kernel trick"
Maximize{α}
n
i αi − 1
2
n
i,j=1 αiαjyiyjK(xi, xj)
subject to: ai ≥ 0,
n
i αiyi = 0,
where K(x, x′
) =def < Φ(x), Φ(x′
) > is the Kernel function
Decision Function: F(x) = sign(
n
i αiyiK(xi, x) − b)

Non-Linear Classiﬁer: Kernels
Polynomial: k(xi, xj) = ( xi, xj + 1)d
Gaussian or Radial Basis Function: k(xi, xj) = exp(
xi−xj
2
2σ2 )
Hyperbolic tangent: k(xi, xj) = tanh(k xi, xj + c)
Examples for Polynomial (left) and Gaussian (right) Kernels:

Ranking Support Vector Machine
Find F(x) which preserves the ordering of the training points.
w
L( 1r )
L( 2r )
xx
x

Ranking Support Vector Machine
The simpliﬁed formulation with linear number of constraints (one per point) and 1 rank = 1 point
Primal problem
Minimize{w, ξ}
1
2 ||w||2
+
N
i=1 Ciξi
subject to
w, Φ(xi) − Φ(xi+1) ≥ 1 − ξi (i = 1 . . . N − 1)
ξi ≥ 0 (i = 1 . . . N − 1)
Dual problem
Maximize{α}
N−1
i αi −
N−1
i,j αijK(xi − xi+1, xj − xj+1))
subject to 0 ≤ αi ≤ Ci (i = 1 . . . N − 1)
Rank Surrogate Function
F(x) = N−1
i=1 αi(K(xi, x) − K(xi+1, x))

Dominance-Based Surrogate
Rank Support Vector Machine
Goal: Find the function F(x) such that:
if xi ≻ xj then F(xi) > F(xj)
, where "≻" deﬁnes the Pareto-dominance relations.
F(x) is invariant to any "≻"-preserving transformation of
objective functions.
The hypervolume indicator of course is not invariant, at least in
the current formulation.

The complexity of the model: How to choose the constraints?
Learn all possible ≻ relations may be too expensive.
Learn only Primary constraints to build a basic model is the
reasonable choice.
Additionally learn small number of the most violated Secondary
constraints - the way to make the model smoother.
Objective 1
Objective2
FSVM
Primary
Secondary
- constraints:“>”
a
b
c
d
e
f

Primary and Secondary constraints
Primary dominance constraints are associated to pairs (xi, xj)
such that xj is the nearest neighbor of xi (in objective space)
conditionally to the fact that xi dominates xj.
Secondary dominance constraints are associated to pairs
(xi, xj) such that xi belongs to the current Pareto front and xj
belongs to another non-dominated front.
Construction of the surrogate model
Initialize archive Ωactive as the set of Primary constraints, and
Ωpassive as the set of Secondary constraints.
Optimize the model for 1000 |Ωactive| iterations.
Add the most violated passive contraint from Ωpassive to Ωactive
and optimize the model for 10 |Ωactive| iterations.
Repeat the last step 0.1|Ωactive| times.

Experimental Validation
Parameters
Surrogate Models:
ASM - aggregated surrogate model based on One-Class SVM
and Regression SVM 2
RASM - proposed Rank-based SVM
SVM Learning:
Number of training points: at most Ntraining = 1000 points
Number of iterations: 1000 |Ωactive| + |Ωactive|
2
≈ 2N2
training
Kernel function: RBF function with σ equal to the average
distance of the training points
The cost of constraint violation: C = 1000
Offspring Selection Procedure:
Number of pre-children: p = 2 and p = 10
2I. Loshchilov, M. Schoenauer, M. Sebag (GECCO 2010). "A Mono Surrogate for Multiobjective Optimization"

Results
Table 1. Comparative results of two baseline EMOAs, namely S-NSGA-II and MO-
CMA-ES and their ASM and RASM variants. Median number of function evaluations
(out of 10 independent runs) to reach ∆Htarget values, normalized by Best: a value of
1 indicates the best result, a value X > 1 indicates that the corresponding algorithm
needed X times more evaluations than the best to reach the same precision.
∆Htarget 1 0.1 0.01 1e-3 1e-4 1 0.1 0.01 1e-3 1e-4
ZDT1 ZDT2
Best 1100 3000 5300 7800 38800 1400 4200 6600 8500 32700
S-NSGA-II 1.6 2 2 2.3 1.1 1.8 1.7 1.8 2.3 1.2
ASM-NSGA p=2 1.2 1.5 1.4 1.5 1.5 1.2 1.2 1.2 1.4 1
ASM-NSGA p=10 1 1 1 1 . 1 1 1 1 .
RASM-NSGA p=2 1.2 1.4 1.4 1.6 1 1.3 1.2 1.2 1.5 1
RASM-NSGA p=10 1 1.1 1.1 1.5 . 1.1 1 1 1.2 .
MO-CMA-ES 16.5 14.4 12.3 11.3 . 14.7 10.7 10 10.1 .
ASM-MO-CMA p=2 6.8 8.5 8.3 8 . 5.9 8.2 7.7 7.5 .
ASM-MO-CMA p=10 6.9 10.1 10.4 12.1 . 5 . . . .
RASM-MO-CMA p=2 5.1 7.7 7.6 7.4 . 5.2 . . . .
RASM-MO-CMA p=10 3.6 4.3 4.9 7.2 . 3.2 . . . .
IHR1 IHR2
Best 500 2000 35300 41200 50300 1700 7000 12900 52900 .
S-NSGA-II 1.6 1.5 . . . 1.1 3.2 6.2 . .
ASM-NSGA p=2 1.2 1.3 . . . 1 3.9 4.9 . .
ASM-NSGA p=10 1 1.5 . . . 1.4 6.4 4.6 . .
RASM-NSGA p=2 1.2 1.2 . . . 1.5 . . . .
RASM-NSGA p=10 1 1 . . . 1.2 5.1 4.8 . .
MO-CMA-ES 8.2 6.5 1.1 1.2 1.2 5.8 2.7 2.1 1 .
ASM-MO-CMA p=2 4.6 2.9 1 1 1 3.1 1.6 1.4 1.1 .
ASM-MO-CMA p=10 9.2 6.1 1.3 1.2 . 5.9 2.6 2.4 . .
RASM-MO-CMA p=2 2.6 2.3 2.4 2.1 . 2.2 1 1 . .
RASM-MO-CMA p=10 1.8 1.9 . . . . . . . .

Comparison of original and SVM-informed versions of NSGA-II and
MO-CMA-ES on ZDT and IHR problems shows:
SVM-informed versions are 1.5 times faster for p = 2 and 2-5 for
p = 10 before the algorithm can ﬁnd nearly-optimal Pareto points.
The premature convergence of approximation of optimal
µ-distribution is observed, because the global surrogate model
deals only with the convergence, but not with the diversity.

Summary
The proposed aggregated surrogate model is invariant to ≻
preserving transformation of the objective functions.
The speed-up is signiﬁcant, but limited to the convergence to the
optimal Pareto front.
The model can incorporate "any" kind of preferences:
extreme points, "=" relations, Hypervolume Contribution,
Decision Maker - deﬁned ≻ relations.
Objective 1
Objective2
a
b
c
d
e
f
g
FSVM
Primary
Secondary
- constraints:“>”
Primary
Secondary
- constraints:“=”

Thank you for your attention!
Questions?

Dominance-Based Pareto-Surrogate for Multi-Objective Optimization

More Related Content

What's hot (20)

Viewers also liked (19)

Similar to Dominance-Based Pareto-Surrogate for Multi-Objective Optimization (20)

Recently uploaded (20)

Dominance-Based Pareto-Surrogate for Multi-Objective Optimization