SlideShare a Scribd company logo
OptimizationOptimization
COS 323COS 323
IngredientsIngredients
• Objective functionObjective function
• VariablesVariables
• ConstraintsConstraints
Find values of the variablesFind values of the variables
that minimize or maximize the objective functionthat minimize or maximize the objective function
while satisfying the constraintswhile satisfying the constraints
Different Kinds of OptimizationDifferent Kinds of Optimization
Figure from: Optimization Technology CenterFigure from: Optimization Technology Center
http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/
Different Optimization TechniquesDifferent Optimization Techniques
• Algorithms have very different flavorAlgorithms have very different flavor
depending on specific problemdepending on specific problem
– Closed form vs. numerical vs. discreteClosed form vs. numerical vs. discrete
– Local vs. global minimaLocal vs. global minima
– Running times ranging from O(1) to NP-hardRunning times ranging from O(1) to NP-hard
• Today:Today:
– Focus on continuous numerical methodsFocus on continuous numerical methods
Optimization in 1-DOptimization in 1-D
• Look for analogies to bracketing in root-Look for analogies to bracketing in root-
findingfinding
• What does it mean toWhat does it mean to bracke tbracke t a minimum?a minimum?
((xxle ftle ft,, ff((xxle ftle ft))))
((xxrig htrig ht,, ff((xxrig htrig ht))))
((xxm idm id ,, ff((xxm idm id ))))
xxle ftle ft << xxm idm id << xxrig htrig ht
ff((xxm idm id ) <) < ff((xxle ftle ft))
ff((xxm idm id ) <) < ff((xxrig htrig ht))
xxle ftle ft << xxm idm id << xxrig htrig ht
ff((xxm idm id ) <) < ff((xxle ftle ft))
ff((xxm idm id ) <) < ff((xxrig htrig ht))
Optimization in 1-DOptimization in 1-D
• Once we have these properties, there isOnce we have these properties, there is atat
leastleast oneone locallocal minimum betweenminimum between xxleftleft andand xxrightright
• Establishing bracket initially:Establishing bracket initially:
– GivenGiven xxinitialinitial,, incre m e ntincre m e nt
– EvaluateEvaluate ff((xxinitialinitial),), ff((xxinitialinitial+ incre m e nt+ incre m e nt))
– If decreasing, step until find an increaseIf decreasing, step until find an increase
– Else, step in opposite direction until find an increaseElse, step in opposite direction until find an increase
– Grow increment at each stepGrow increment at each step
• For maximization: substitute –For maximization: substitute –ff forfor ff
Optimization in 1-DOptimization in 1-D
• Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew
((xxle ftle ft,, ff((xxle ftle ft))))
((xxrig htrig ht,, ff((xxrig htrig ht))))
((xxm idm id ,, ff((xxm idm id ))))
((xxne wne w ,, ff((xxne wne w ))))
Optimization in 1-DOptimization in 1-D
• Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew
– Here, new “bracket” points areHere, new “bracket” points are xxnewnew ,, xxmidmid ,, xxrightright
((xxle ftle ft,, ff((xxle ftle ft))))
((xxrig htrig ht,, ff((xxrig htrig ht))))
((xxm idm id ,, ff((xxm idm id ))))
((xxne wne w ,, ff((xxne wne w ))))
Optimization in 1-DOptimization in 1-D
• Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew
– Here, new “bracket” points areHere, new “bracket” points are xxleftleft,, xxnewnew ,, xxmidmid
((xxle ftle ft,, ff((xxle ftle ft))))
((xxrig htrig ht,, ff((xxrig htrig ht))))
((xxm idm id ,, ff((xxm idm id ))))
((xxne wne w ,, ff((xxne wne w ))))
Optimization in 1-DOptimization in 1-D
• Unlike with root-finding, can’t alwaysUnlike with root-finding, can’t always
guarantee that interval will be reduced by aguarantee that interval will be reduced by a
factor of 2factor of 2
• Let’s find the optimal place forLet’s find the optimal place for xx midmid , relative to, relative to
left and right, that will guarantee same factorleft and right, that will guarantee same factor
of reduction regardless of outcomeof reduction regardless of outcome
Optimization in 1-DOptimization in 1-D
ifif ff((xxnewnew) <) < ff((xxmidmid ))
new interval =new interval = αα
elseelse
new interval = 1–new interval = 1–αα22
αα
αα22
Golden Section SearchGolden Section Search
• To assure same interval, wantTo assure same interval, want αα = 1–= 1–αα22
• So,So,
• This is the “golden ratio” = 0.618…This is the “golden ratio” = 0.618…
• So, interval decreases by 30% per iterationSo, interval decreases by 30% per iteration
– Line ar co nve rg e nceLine ar co nve rg e nce
ϕα =
−
=
2
15
Error ToleranceError Tolerance
• Around minimum, derivative = 0, soAround minimum, derivative = 0, so
• Rule of thumb: pointless to ask for moreRule of thumb: pointless to ask for more
accuracy than sqrt(accuracy than sqrt(εε ))
– Can use double precision if you want a single-Can use double precision if you want a single-
precision result (and/or have single-precision data)precision result (and/or have single-precision data)
ε
ε
~
machine)()()(
...)()()(
2
2
1
2
2
1
x
xxfxfxxf
xxfxfxxf
∆⇒
=∆′′=−∆+
+∆′′+=∆+
Faster 1-D OptimizationFaster 1-D Optimization
• Trade off super-linear convergence forTrade off super-linear convergence for
worse robustnessworse robustness
– Combine with Golden Section search for safetyCombine with Golden Section search for safety
• Usual bag of tricks:Usual bag of tricks:
– Fit parabola through 3 points, find minimumFit parabola through 3 points, find minimum
– Compute derivatives as well as positions, fit cubicCompute derivatives as well as positions, fit cubic
– UseUse se co ndse co nd derivatives: Newtonderivatives: Newton
Newton’s MethodNewton’s Method
Newton’s MethodNewton’s Method
Newton’s MethodNewton’s Method
Newton’s MethodNewton’s Method
Newton’s MethodNewton’s Method
• At each step:At each step:
• Requires 1Requires 1stst
and 2and 2ndnd
derivativesderivatives
• Quadratic convergenceQuadratic convergence
)(
)(
1
k
k
kk
xf
xf
xx
′′
′
−=+
Multi-Dimensional OptimizationMulti-Dimensional Optimization
• Important in many areasImportant in many areas
– Fitting a model to measured dataFitting a model to measured data
– Finding best design in some parameter spaceFinding best design in some parameter space
• Hard in generalHard in general
– Weird shapes: multiple extrema, saddles,Weird shapes: multiple extrema, saddles,
curved or elongated valleys, etc.curved or elongated valleys, etc.
– Can’t bracketCan’t bracket
• In general, easier than rootfindingIn general, easier than rootfinding
– Can always walk “downhill”Can always walk “downhill”
Newton’s Method inNewton’s Method in
Multiple DimensionsMultiple Dimensions
• Replace 1Replace 1stst
derivative with gradient,derivative with gradient,
22ndnd
derivative with Hessianderivative with Hessian








=








=∇
∂
∂
∂∂
∂
∂∂
∂
∂
∂
∂
∂
∂
∂
2
22
2
2
2
),(
y
f
yx
f
yx
f
x
f
y
f
x
f
H
f
yxf
Newton’s Method inNewton’s Method in
Multiple DimensionsMultiple Dimensions
• Replace 1Replace 1stst
derivative with gradient,derivative with gradient,
22ndnd
derivative with Hessianderivative with Hessian
• So,So,
• Tends to be extremely fragile unless functionTends to be extremely fragile unless function
very smooth and starting close to minimumvery smooth and starting close to minimum
)()(1
1 kkkk xfxHxx

∇−= −
+
Important classification of methodsImportant classification of methods
• UseUse function + gradient + Hessianfunction + gradient + Hessian (Newton)(Newton)
• UseUse function + gradientfunction + gradient (most descent methods)(most descent methods)
• UseUse function values onlyfunction values only (Nelder-Mead, called(Nelder-Mead, called
also “simplex”, or “amoeba” method)also “simplex”, or “amoeba” method)
Steepest Descent MethodsSteepest Descent Methods
• What if you can’t / don’t want toWhat if you can’t / don’t want to
use 2use 2ndnd
derivative?derivative?
• ““Quasi-Newton” methods estimate HessianQuasi-Newton” methods estimate Hessian
• Alternative: walk along (negative of)Alternative: walk along (negative of)
gradient…gradient…
– PerformPerform 1-D minimization1-D minimization along line passingalong line passing
through current point in the direction of thethrough current point in the direction of the
gradientgradient
– Once done, re-compute gradient, iterateOnce done, re-compute gradient, iterate
Problem With Steepest DescentProblem With Steepest Descent
Problem With Steepest DescentProblem With Steepest Descent
Conjugate Gradient MethodsConjugate Gradient Methods
• Idea: avoid “undoing”Idea: avoid “undoing”
minimization that’sminimization that’s
already been donealready been done
• Walk along directionWalk along direction
• Polak and RibierePolak and Ribiere
formula:formula:
kkkk dgd β+−= ++ 11
kk
kk
k
gg
gggk
T
1
T
)(1
−
=
++
β
Conjugate Gradient MethodsConjugate Gradient Methods
• Conjugate gradient implicitly obtainsConjugate gradient implicitly obtains
information about Hessianinformation about Hessian
• For quadratic function inFor quadratic function in nn dimensions, getsdimensions, gets
e xacte xact solution insolution in nn steps (ignoring roundoffsteps (ignoring roundoff
error)error)
• Works well in practice…Works well in practice…
Value-Only Methods in Multi-DimensionsValue-Only Methods in Multi-Dimensions
• If can’t evaluate gradients, life is hardIf can’t evaluate gradients, life is hard
• Can use approximate (numerically evaluated)Can use approximate (numerically evaluated)
gradients:gradients:














≈














=∇ −⋅+
−⋅+
−⋅+
∂
∂
∂
∂
∂
∂

δ
δ
δ
δ
δ
δ
)()(
)()(
)()(
3
2
1
3
2
1
)( xfexf
xfexf
xfexf
e
f
e
f
e
f
xf
Generic Optimization StrategiesGeneric Optimization Strategies
• Uniform sampling:Uniform sampling:
– Cost rises exponentially with # of dimensionsCost rises exponentially with # of dimensions
• Simulated annealing:Simulated annealing:
– Search in random directionsSearch in random directions
– Start with large steps, gradually decreaseStart with large steps, gradually decrease
– ““Annealing schedule” – how fast to cool?Annealing schedule” – how fast to cool?
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• Keep track ofKeep track of nn+1 points in+1 points in nn dimensionsdimensions
– Vertices of aVertices of a sim ple xsim ple x (triangle in 2D(triangle in 2D
tetrahedron in 3D, etc.)tetrahedron in 3D, etc.)
• At each iteration: simplex can move,At each iteration: simplex can move,
expand, or contractexpand, or contract
– Sometimes known asSometimes known as am o e ba m e tho dam o e ba m e tho d ::
simplex “oozes” along the functionsimplex “oozes” along the function
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• Basic operation:Basic operation: reflectionreflection
worst pointworst point
(highest function value)(highest function value)
location probed bylocation probed by
reflectionreflection stepstep
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• If reflection resulted in best (lowest) value soIf reflection resulted in best (lowest) value so
far,far,
try antry an expansionexpansion
• Else, if reflection helped at all, keep itElse, if reflection helped at all, keep it
location probed bylocation probed by
expansionexpansion stepstep
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• If reflection didn’t help (reflected point stillIf reflection didn’t help (reflected point still
worst) try aworst) try a contractioncontraction
location probed bylocation probed by
contrationcontration stepstep
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• If all else failsIf all else fails shrinkshrink the simplex aroundthe simplex around
thethe be stbe st pointpoint
Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead)
• Method fairly efficient at each iterationMethod fairly efficient at each iteration
(typically 1-2 function evaluations)(typically 1-2 function evaluations)
• Can takeCan take lo tslo ts of iterationsof iterations
• Somewhat flakey – sometimes needsSomewhat flakey – sometimes needs re startre start
after simplex collapses on itself, etc.after simplex collapses on itself, etc.
• Benefits: simple to implement, doesn’t needBenefits: simple to implement, doesn’t need
derivative, doesn’t care about functionderivative, doesn’t care about function
smoothness, etc.smoothness, etc.
Rosenbrock’s FunctionRosenbrock’s Function
• Designed specifically for testingDesigned specifically for testing
optimization techniquesoptimization techniques
• Curved, narrow valleyCurved, narrow valley
222
)1()(100),( xxyyxf −+−=
Constrained OptimizationConstrained Optimization
• Equality constraints: optimizeEquality constraints: optimize f(x)f(x)
subject tosubject to gg ii(x)(x)=0=0
• Method of Lagrange multipliers: convert to aMethod of Lagrange multipliers: convert to a
higher-dimensional problemhigher-dimensional problem
• Minimize w.r.t.Minimize w.r.t.∑+ )()( xgxf iiλ );( 11 knxx λλ 
Constrained OptimizationConstrained Optimization
• Inequality constraints are harder…Inequality constraints are harder…
• If objective function and constraints all linear,If objective function and constraints all linear,
this is “linear programming”this is “linear programming”
• Observation: minimum must lie at corner ofObservation: minimum must lie at corner of
region formed by constraintsregion formed by constraints
• Simplex method: move from vertex to vertex,Simplex method: move from vertex to vertex,
minimizing objective functionminimizing objective function
Constrained OptimizationConstrained Optimization
• General “nonlinear programming” hardGeneral “nonlinear programming” hard
• Algorithms for special cases (e.g. quadratic)Algorithms for special cases (e.g. quadratic)
Global OptimizationGlobal Optimization
• In general, can’t guarantee that you’ve foundIn general, can’t guarantee that you’ve found
global (rather than local) minimumglobal (rather than local) minimum
• Some heuristics:Some heuristics:
– Multi-start: try local optimization fromMulti-start: try local optimization from
several starting positionsseveral starting positions
– Very slow simulated annealingVery slow simulated annealing
– Use analytical methods (or graphing) to determineUse analytical methods (or graphing) to determine
behavior, guide methods to correct neighborhoodsbehavior, guide methods to correct neighborhoods

More Related Content

PDF
Lecture 06
PDF
Overview of Stochastic Calculus Foundations
PPTX
2. filtering basics
PDF
(文献紹介)デブラー手法の紹介
PDF
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb)
PDF
Policy Gradient Theorem
PDF
Stochastic Control of Optimal Trade Order Execution
PPTX
Image Restoration (Digital Image Processing)
Lecture 06
Overview of Stochastic Calculus Foundations
2. filtering basics
(文献紹介)デブラー手法の紹介
IVR - Chapter 2 - Basics of filtering I: Spatial filters (25Mb)
Policy Gradient Theorem
Stochastic Control of Optimal Trade Order Execution
Image Restoration (Digital Image Processing)

Viewers also liked (13)

PPT
Function Approx2009
PPTX
Teaching the Dynamics and Significance of Non-Newtonian Materials
PDF
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
PPT
Energy Minimization Using Gromacs
PPT
Seminar energy minimization mettthod
PPT
PPTX
energy minimization
PDF
BIOS 203 Lecture 3: Classical molecular dynamics
PPTX
Energy minimization
PDF
Gradient descent method
PDF
Molecular dynamics and Simulations
PPTX
Knapsack Problem
ODP
Lecture 2-cs648
Function Approx2009
Teaching the Dynamics and Significance of Non-Newtonian Materials
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
Energy Minimization Using Gromacs
Seminar energy minimization mettthod
energy minimization
BIOS 203 Lecture 3: Classical molecular dynamics
Energy minimization
Gradient descent method
Molecular dynamics and Simulations
Knapsack Problem
Lecture 2-cs648
Ad

Similar to Optimization (20)

PPT
cos323_s06_lecture03_optimization.ppt
PDF
Optim_methods.pdf
PPTX
Optimization tutorial
PDF
AOT3 Multivariable Optimization Algorithms.pdf
PPT
Optimization Methods
PDF
MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...
PDF
03 optimization
PDF
Optimization_is_the_process_of_Opt_Method_Chapter_3.pdf
PPTX
O D S Techniques in process optimization.pptx
PDF
Evolutionary computation 5773-lecture03-Fall24 (8-23-24).pdf
PPTX
SC_U2_PPT.pptxE13RWT4SDGHYFYJGVKBHJL.J,HGRFEDWSEZRFXDGTCHVYJBKNLM.LK,JKHM
PDF
Optimum engineering design - Day 5. Clasical optimization methods
PDF
CI_L01_Optimization.pdf
PPT
CH1.ppt
PPT
lecture.ppt
PDF
Methods for Non-Linear Least Squares Problems
PDF
SKuehn_MachineLearningAndOptimization_2015
PDF
勾配法
PPTX
Multivariable Optimization-for class (1).pptx
PDF
02.03 Artificial Intelligence: Search by Optimization
cos323_s06_lecture03_optimization.ppt
Optim_methods.pdf
Optimization tutorial
AOT3 Multivariable Optimization Algorithms.pdf
Optimization Methods
MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...
03 optimization
Optimization_is_the_process_of_Opt_Method_Chapter_3.pdf
O D S Techniques in process optimization.pptx
Evolutionary computation 5773-lecture03-Fall24 (8-23-24).pdf
SC_U2_PPT.pptxE13RWT4SDGHYFYJGVKBHJL.J,HGRFEDWSEZRFXDGTCHVYJBKNLM.LK,JKHM
Optimum engineering design - Day 5. Clasical optimization methods
CI_L01_Optimization.pdf
CH1.ppt
lecture.ppt
Methods for Non-Linear Least Squares Problems
SKuehn_MachineLearningAndOptimization_2015
勾配法
Multivariable Optimization-for class (1).pptx
02.03 Artificial Intelligence: Search by Optimization
Ad

Optimization

  • 2. IngredientsIngredients • Objective functionObjective function • VariablesVariables • ConstraintsConstraints Find values of the variablesFind values of the variables that minimize or maximize the objective functionthat minimize or maximize the objective function while satisfying the constraintswhile satisfying the constraints
  • 3. Different Kinds of OptimizationDifferent Kinds of Optimization Figure from: Optimization Technology CenterFigure from: Optimization Technology Center http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/http://www-fp.mcs.anl.gov/otc/Guide/OptWeb/
  • 4. Different Optimization TechniquesDifferent Optimization Techniques • Algorithms have very different flavorAlgorithms have very different flavor depending on specific problemdepending on specific problem – Closed form vs. numerical vs. discreteClosed form vs. numerical vs. discrete – Local vs. global minimaLocal vs. global minima – Running times ranging from O(1) to NP-hardRunning times ranging from O(1) to NP-hard • Today:Today: – Focus on continuous numerical methodsFocus on continuous numerical methods
  • 5. Optimization in 1-DOptimization in 1-D • Look for analogies to bracketing in root-Look for analogies to bracketing in root- findingfinding • What does it mean toWhat does it mean to bracke tbracke t a minimum?a minimum? ((xxle ftle ft,, ff((xxle ftle ft)))) ((xxrig htrig ht,, ff((xxrig htrig ht)))) ((xxm idm id ,, ff((xxm idm id )))) xxle ftle ft << xxm idm id << xxrig htrig ht ff((xxm idm id ) <) < ff((xxle ftle ft)) ff((xxm idm id ) <) < ff((xxrig htrig ht)) xxle ftle ft << xxm idm id << xxrig htrig ht ff((xxm idm id ) <) < ff((xxle ftle ft)) ff((xxm idm id ) <) < ff((xxrig htrig ht))
  • 6. Optimization in 1-DOptimization in 1-D • Once we have these properties, there isOnce we have these properties, there is atat leastleast oneone locallocal minimum betweenminimum between xxleftleft andand xxrightright • Establishing bracket initially:Establishing bracket initially: – GivenGiven xxinitialinitial,, incre m e ntincre m e nt – EvaluateEvaluate ff((xxinitialinitial),), ff((xxinitialinitial+ incre m e nt+ incre m e nt)) – If decreasing, step until find an increaseIf decreasing, step until find an increase – Else, step in opposite direction until find an increaseElse, step in opposite direction until find an increase – Grow increment at each stepGrow increment at each step • For maximization: substitute –For maximization: substitute –ff forfor ff
  • 7. Optimization in 1-DOptimization in 1-D • Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew ((xxle ftle ft,, ff((xxle ftle ft)))) ((xxrig htrig ht,, ff((xxrig htrig ht)))) ((xxm idm id ,, ff((xxm idm id )))) ((xxne wne w ,, ff((xxne wne w ))))
  • 8. Optimization in 1-DOptimization in 1-D • Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew – Here, new “bracket” points areHere, new “bracket” points are xxnewnew ,, xxmidmid ,, xxrightright ((xxle ftle ft,, ff((xxle ftle ft)))) ((xxrig htrig ht,, ff((xxrig htrig ht)))) ((xxm idm id ,, ff((xxm idm id )))) ((xxne wne w ,, ff((xxne wne w ))))
  • 9. Optimization in 1-DOptimization in 1-D • Strategy: evaluate function at someStrategy: evaluate function at some xxnewnew – Here, new “bracket” points areHere, new “bracket” points are xxleftleft,, xxnewnew ,, xxmidmid ((xxle ftle ft,, ff((xxle ftle ft)))) ((xxrig htrig ht,, ff((xxrig htrig ht)))) ((xxm idm id ,, ff((xxm idm id )))) ((xxne wne w ,, ff((xxne wne w ))))
  • 10. Optimization in 1-DOptimization in 1-D • Unlike with root-finding, can’t alwaysUnlike with root-finding, can’t always guarantee that interval will be reduced by aguarantee that interval will be reduced by a factor of 2factor of 2 • Let’s find the optimal place forLet’s find the optimal place for xx midmid , relative to, relative to left and right, that will guarantee same factorleft and right, that will guarantee same factor of reduction regardless of outcomeof reduction regardless of outcome
  • 11. Optimization in 1-DOptimization in 1-D ifif ff((xxnewnew) <) < ff((xxmidmid )) new interval =new interval = αα elseelse new interval = 1–new interval = 1–αα22 αα αα22
  • 12. Golden Section SearchGolden Section Search • To assure same interval, wantTo assure same interval, want αα = 1–= 1–αα22 • So,So, • This is the “golden ratio” = 0.618…This is the “golden ratio” = 0.618… • So, interval decreases by 30% per iterationSo, interval decreases by 30% per iteration – Line ar co nve rg e nceLine ar co nve rg e nce ϕα = − = 2 15
  • 13. Error ToleranceError Tolerance • Around minimum, derivative = 0, soAround minimum, derivative = 0, so • Rule of thumb: pointless to ask for moreRule of thumb: pointless to ask for more accuracy than sqrt(accuracy than sqrt(εε )) – Can use double precision if you want a single-Can use double precision if you want a single- precision result (and/or have single-precision data)precision result (and/or have single-precision data) ε ε ~ machine)()()( ...)()()( 2 2 1 2 2 1 x xxfxfxxf xxfxfxxf ∆⇒ =∆′′=−∆+ +∆′′+=∆+
  • 14. Faster 1-D OptimizationFaster 1-D Optimization • Trade off super-linear convergence forTrade off super-linear convergence for worse robustnessworse robustness – Combine with Golden Section search for safetyCombine with Golden Section search for safety • Usual bag of tricks:Usual bag of tricks: – Fit parabola through 3 points, find minimumFit parabola through 3 points, find minimum – Compute derivatives as well as positions, fit cubicCompute derivatives as well as positions, fit cubic – UseUse se co ndse co nd derivatives: Newtonderivatives: Newton
  • 19. Newton’s MethodNewton’s Method • At each step:At each step: • Requires 1Requires 1stst and 2and 2ndnd derivativesderivatives • Quadratic convergenceQuadratic convergence )( )( 1 k k kk xf xf xx ′′ ′ −=+
  • 20. Multi-Dimensional OptimizationMulti-Dimensional Optimization • Important in many areasImportant in many areas – Fitting a model to measured dataFitting a model to measured data – Finding best design in some parameter spaceFinding best design in some parameter space • Hard in generalHard in general – Weird shapes: multiple extrema, saddles,Weird shapes: multiple extrema, saddles, curved or elongated valleys, etc.curved or elongated valleys, etc. – Can’t bracketCan’t bracket • In general, easier than rootfindingIn general, easier than rootfinding – Can always walk “downhill”Can always walk “downhill”
  • 21. Newton’s Method inNewton’s Method in Multiple DimensionsMultiple Dimensions • Replace 1Replace 1stst derivative with gradient,derivative with gradient, 22ndnd derivative with Hessianderivative with Hessian         =         =∇ ∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ 2 22 2 2 2 ),( y f yx f yx f x f y f x f H f yxf
  • 22. Newton’s Method inNewton’s Method in Multiple DimensionsMultiple Dimensions • Replace 1Replace 1stst derivative with gradient,derivative with gradient, 22ndnd derivative with Hessianderivative with Hessian • So,So, • Tends to be extremely fragile unless functionTends to be extremely fragile unless function very smooth and starting close to minimumvery smooth and starting close to minimum )()(1 1 kkkk xfxHxx  ∇−= − +
  • 23. Important classification of methodsImportant classification of methods • UseUse function + gradient + Hessianfunction + gradient + Hessian (Newton)(Newton) • UseUse function + gradientfunction + gradient (most descent methods)(most descent methods) • UseUse function values onlyfunction values only (Nelder-Mead, called(Nelder-Mead, called also “simplex”, or “amoeba” method)also “simplex”, or “amoeba” method)
  • 24. Steepest Descent MethodsSteepest Descent Methods • What if you can’t / don’t want toWhat if you can’t / don’t want to use 2use 2ndnd derivative?derivative? • ““Quasi-Newton” methods estimate HessianQuasi-Newton” methods estimate Hessian • Alternative: walk along (negative of)Alternative: walk along (negative of) gradient…gradient… – PerformPerform 1-D minimization1-D minimization along line passingalong line passing through current point in the direction of thethrough current point in the direction of the gradientgradient – Once done, re-compute gradient, iterateOnce done, re-compute gradient, iterate
  • 25. Problem With Steepest DescentProblem With Steepest Descent
  • 26. Problem With Steepest DescentProblem With Steepest Descent
  • 27. Conjugate Gradient MethodsConjugate Gradient Methods • Idea: avoid “undoing”Idea: avoid “undoing” minimization that’sminimization that’s already been donealready been done • Walk along directionWalk along direction • Polak and RibierePolak and Ribiere formula:formula: kkkk dgd β+−= ++ 11 kk kk k gg gggk T 1 T )(1 − = ++ β
  • 28. Conjugate Gradient MethodsConjugate Gradient Methods • Conjugate gradient implicitly obtainsConjugate gradient implicitly obtains information about Hessianinformation about Hessian • For quadratic function inFor quadratic function in nn dimensions, getsdimensions, gets e xacte xact solution insolution in nn steps (ignoring roundoffsteps (ignoring roundoff error)error) • Works well in practice…Works well in practice…
  • 29. Value-Only Methods in Multi-DimensionsValue-Only Methods in Multi-Dimensions • If can’t evaluate gradients, life is hardIf can’t evaluate gradients, life is hard • Can use approximate (numerically evaluated)Can use approximate (numerically evaluated) gradients:gradients:               ≈               =∇ −⋅+ −⋅+ −⋅+ ∂ ∂ ∂ ∂ ∂ ∂  δ δ δ δ δ δ )()( )()( )()( 3 2 1 3 2 1 )( xfexf xfexf xfexf e f e f e f xf
  • 30. Generic Optimization StrategiesGeneric Optimization Strategies • Uniform sampling:Uniform sampling: – Cost rises exponentially with # of dimensionsCost rises exponentially with # of dimensions • Simulated annealing:Simulated annealing: – Search in random directionsSearch in random directions – Start with large steps, gradually decreaseStart with large steps, gradually decrease – ““Annealing schedule” – how fast to cool?Annealing schedule” – how fast to cool?
  • 31. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • Keep track ofKeep track of nn+1 points in+1 points in nn dimensionsdimensions – Vertices of aVertices of a sim ple xsim ple x (triangle in 2D(triangle in 2D tetrahedron in 3D, etc.)tetrahedron in 3D, etc.) • At each iteration: simplex can move,At each iteration: simplex can move, expand, or contractexpand, or contract – Sometimes known asSometimes known as am o e ba m e tho dam o e ba m e tho d :: simplex “oozes” along the functionsimplex “oozes” along the function
  • 32. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • Basic operation:Basic operation: reflectionreflection worst pointworst point (highest function value)(highest function value) location probed bylocation probed by reflectionreflection stepstep
  • 33. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • If reflection resulted in best (lowest) value soIf reflection resulted in best (lowest) value so far,far, try antry an expansionexpansion • Else, if reflection helped at all, keep itElse, if reflection helped at all, keep it location probed bylocation probed by expansionexpansion stepstep
  • 34. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • If reflection didn’t help (reflected point stillIf reflection didn’t help (reflected point still worst) try aworst) try a contractioncontraction location probed bylocation probed by contrationcontration stepstep
  • 35. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • If all else failsIf all else fails shrinkshrink the simplex aroundthe simplex around thethe be stbe st pointpoint
  • 36. Downhill Simplex Method (Nelder-Mead)Downhill Simplex Method (Nelder-Mead) • Method fairly efficient at each iterationMethod fairly efficient at each iteration (typically 1-2 function evaluations)(typically 1-2 function evaluations) • Can takeCan take lo tslo ts of iterationsof iterations • Somewhat flakey – sometimes needsSomewhat flakey – sometimes needs re startre start after simplex collapses on itself, etc.after simplex collapses on itself, etc. • Benefits: simple to implement, doesn’t needBenefits: simple to implement, doesn’t need derivative, doesn’t care about functionderivative, doesn’t care about function smoothness, etc.smoothness, etc.
  • 37. Rosenbrock’s FunctionRosenbrock’s Function • Designed specifically for testingDesigned specifically for testing optimization techniquesoptimization techniques • Curved, narrow valleyCurved, narrow valley 222 )1()(100),( xxyyxf −+−=
  • 38. Constrained OptimizationConstrained Optimization • Equality constraints: optimizeEquality constraints: optimize f(x)f(x) subject tosubject to gg ii(x)(x)=0=0 • Method of Lagrange multipliers: convert to aMethod of Lagrange multipliers: convert to a higher-dimensional problemhigher-dimensional problem • Minimize w.r.t.Minimize w.r.t.∑+ )()( xgxf iiλ );( 11 knxx λλ 
  • 39. Constrained OptimizationConstrained Optimization • Inequality constraints are harder…Inequality constraints are harder… • If objective function and constraints all linear,If objective function and constraints all linear, this is “linear programming”this is “linear programming” • Observation: minimum must lie at corner ofObservation: minimum must lie at corner of region formed by constraintsregion formed by constraints • Simplex method: move from vertex to vertex,Simplex method: move from vertex to vertex, minimizing objective functionminimizing objective function
  • 40. Constrained OptimizationConstrained Optimization • General “nonlinear programming” hardGeneral “nonlinear programming” hard • Algorithms for special cases (e.g. quadratic)Algorithms for special cases (e.g. quadratic)
  • 41. Global OptimizationGlobal Optimization • In general, can’t guarantee that you’ve foundIn general, can’t guarantee that you’ve found global (rather than local) minimumglobal (rather than local) minimum • Some heuristics:Some heuristics: – Multi-start: try local optimization fromMulti-start: try local optimization from several starting positionsseveral starting positions – Very slow simulated annealingVery slow simulated annealing – Use analytical methods (or graphing) to determineUse analytical methods (or graphing) to determine behavior, guide methods to correct neighborhoodsbehavior, guide methods to correct neighborhoods