SlideShare a Scribd company logo
The Foundations of (Machine) Learning
Stefan Kühn
codecentric AG
CSMLS Meetup Hamburg - February 26th, 2015
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 1 / 18
Contents
1 Supervised Machine Learning
2 Optimization Theory
3 Concrete Optimization Methods
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 2 / 18
1 Supervised Machine Learning
2 Optimization Theory
3 Concrete Optimization Methods
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 3 / 18
Setting
Supervised Learning Approach
Use labeled training data in order to fit a given model to the data, i.e. to
learn from the given data.
Typical Problems:
Classification - discrete output
Logistic Regression
Neural Networks
Support Vector Machines
Regression - continuous output
Linear Regression
Support Vector Regression
Generalized Linear/Additive Models
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 4 / 18
Training and Learning
Ingredients:
Training Data Set
Model, e.g. Logistic Regression
Error Measure, e.g. Mean Squared Error
Learning Procedure:
Derive objective function from Model and Error Measure
Initialize Model parameters
Find a good fit!
Iterate with other initial parameters
What is Learning in this context?
Learning is nothing but the application of an algorithm for unconstrained
optimization to the given objective function.
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 5 / 18
1 Supervised Machine Learning
2 Optimization Theory
3 Concrete Optimization Methods
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 6 / 18
Unconstrained Optimization
Higher-order methods
Newton’s method (fast local convergence)
Gradient-based methods
Gradient Descent / Steepest Descent (globally convergent)
Conjugate Gradient (globally convergent)
Gauß-Newton, Levenberg-Marquardt, Quasi-Newton
Krylow Subspace methods
Derivative-free methods, direct search
Secant method (locally convergent)
Regula Falsi and successors (global convergence, typically slow)
Nelder-Mead / Downhill-Simplex
unconventional method, creates a moving simplex
driven by reflection/contraction/expansion of the corner points
globally convergent for differentiable functions f ∈ C1
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 7 / 18
General Iterative Algorithmic Scheme
Goal: Minimize a given function f :
min f (x), x ∈ Rn
Iterative Algorithms
Starting from a given point an iterative algorithm tries to minimize the
objective function step by step.
Preparation: k = 0
Initialization: Choose initial points and parameters
Iterate until convergence: k = 1, 2, 3, . . .
Termination criterion: Check optimality of the current iterate
Descent Direction: Find reasonable search direction
Stepsize: Determine length of the step in the given direction
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 8 / 18
Termination criteria
Critical points x∗
:
f (x∗
) = 0
Gradient: Should converge to zero
f (x∗
) < tol
Iterates: Distance between xk and xk+1 should converge to zero
xk
− xk+1
< tol
Function Values: Difference between f (xk) and f (xk+1) should
converge to zero
f (xk
) − f (xk+1
) < tol
Number of iterations: Terminate after maxiter iterations
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 9 / 18
Descent direction
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 10 / 18
Descent direction
Geometric interpretation
d is a descent direction if and only if the angle α between the gradient
f (x) and d is in a certain range:
π
2
= 90◦
< α < 270◦
=
3π
2
Algebraic equivalent
The sign of the scalar product between two vectors a and b is determined
by the cosine of the angle α between a and b:
a, b = aT
b = a b cos α(a, b)
d is a descent direction if and only if:
dT
f (x) < 0
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 11 / 18
Stepsize
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 12 / 18
Stepsize
Armijo’s rule
Takes two parameters 0 < σ < 1, and 0 < ρ < 0.5
For = 0, 1, 2, ... test Armijo condition:
f (p + σ d) < f (p) + ρσ dT
f (p)
Accepted stepsize
First that passes this test determines the accepted stepsize
t = σ
Standard Armijo implies, that for the accepted stepsize t always holds
t <= 1, only semi-efficient.
Technical detail: Widening
Test whether some t > 0 satisfy Armijo condition, i.e. check
= −1, −2, . . . as well, ensures efficiency.
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 13 / 18
1 Supervised Machine Learning
2 Optimization Theory
3 Concrete Optimization Methods
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 14 / 18
Gradient Descent
Descent direction
Direction of Steepest Descent, the negative gradient:
d = − f (x)
Motivation:
corresponds to α = 180◦ = π
obvious choice, always a descent direction, no test needed
guarantees the quickest win locally
works with inexact line search, e.g. Armijo’ s rule
works for functions f ∈ C1
always solves auxiliary optimization problem
min sT
f (x), s ∈ Rn
, s = 1
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 15 / 18
Conjugate Gradient
Motivation: Quadratic Model Problem, minimize
f (x) = Ax − b 2
Optimality condition:
f (x∗
) = 2AT
(Ax∗
− b) = 0
Obvious approach: Solve system of linear equations
AT
Ax = AT
b
Descent direction
Consecutive directions di , . . . , di+k satisfy certain orthogonality or
conjugacy conditions, M = AT A symmetric positive definite:
dT
i Mdj = 0, i = j
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 16 / 18
Nonlinear Conjugate Gradient
Initial Steps:
start at point x0 with d0 = − f (x0)
perform exact line search, find
t0 = arg min f (x0 + td0), t > 0
set x1 = x0 + t0d0.
Iteration:
set ∆k = − f (xk)
compute βk via one of the available formulas (next slide)
update conjugate search direction dk = ∆k + βkdk−1
perform exact line search, find
tk = arg min f (xk + tdk), t > 0
set xk+1 = xk + tkdk
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 17 / 18
Nonlinear Conjugate Gradient
Formulas for βk:
Fletcher-Reeves
βFR
k =
∆T
k ∆k
∆T
k−1∆k−1
Polak-Ribière βPR
k =
∆T
k (∆k − ∆k−1)
∆T
k−1∆k−1
Hestenes-Stiefel βHS
k = −
∆T
k (∆k − ∆k−1)
sT
k−1 (∆k − ∆k−1)
Dai-Yuan βDY
k = −
∆T
k ∆k
sT
k−1 (∆k − ∆k−1)
Reasonable choice with automatic direction reset:
β = max 0, βPR
Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 18 / 18

More Related Content

PPTX
FDM Numerical solution of Laplace Equation using MATLAB
PPT
Computational geometry
PPT
A Tutorial on Computational Geometry
PPTX
Design and Analysis of Algorithms
PPT
Vector Tools
PPT
Matrix Multiplication(An example of concurrent programming)
PDF
Analysis and design of algorithms part 4
PPTX
Problem reduction AND OR GRAPH & AO* algorithm.ppt
FDM Numerical solution of Laplace Equation using MATLAB
Computational geometry
A Tutorial on Computational Geometry
Design and Analysis of Algorithms
Vector Tools
Matrix Multiplication(An example of concurrent programming)
Analysis and design of algorithms part 4
Problem reduction AND OR GRAPH & AO* algorithm.ppt

What's hot (19)

DOCX
Matrix multiplicationdesign
PDF
I. AO* SEARCH ALGORITHM
PPTX
Matrix chain multiplication by MHM
PPTX
Matrix mult class-17
PPTX
3 analysis.gtm
PPT
Addition and subtraction with signed magnitude data (mano
PPTX
Asymptotic Notation
PPTX
And or graph problem reduction using predicate logic
PPTX
Topic 2
PDF
A* Search Algorithm
PDF
Presentation - Bi-directional A-star search
DOC
hospital management
PDF
Circular convolution Using DFT Matlab Code
PDF
Data Structure: Algorithm and analysis
PPT
Basic_analysis.ppt
PDF
An Efficient Elliptic Curve Cryptography Arithmetic Using Nikhilam Multiplica...
PPTX
Lecture 21 problem reduction search ao star search
PDF
Fast coputation of Phi(x) inverse
Matrix multiplicationdesign
I. AO* SEARCH ALGORITHM
Matrix chain multiplication by MHM
Matrix mult class-17
3 analysis.gtm
Addition and subtraction with signed magnitude data (mano
Asymptotic Notation
And or graph problem reduction using predicate logic
Topic 2
A* Search Algorithm
Presentation - Bi-directional A-star search
hospital management
Circular convolution Using DFT Matlab Code
Data Structure: Algorithm and analysis
Basic_analysis.ppt
An Efficient Elliptic Curve Cryptography Arithmetic Using Nikhilam Multiplica...
Lecture 21 problem reduction search ao star search
Fast coputation of Phi(x) inverse
Ad

Viewers also liked (17)

PDF
NIT PORTAL SOCIAL -A SUSTENTABILIDADE NA SAÚDE COMEÇA ANTES DOS HOSPITAIS
PPTX
Realidad nacional 2222
PPTX
Diapositivas realidad
PDF
License plate recognition an insight to the proposed approach for plate local...
DOCX
7. photos
PPTX
Realidad nacional
PPTX
Better Broadband for All
PPTX
Leo el papi
PPTX
The Idaho MSA for Employer Groups & Employees
PDF
Zur Ökonomie von Netzneutralität
PDF
Cartilha Saúde Gratuita - Instituto Canguru
PPTX
Diapositivas realidad
PPTX
PPT
4экспертиза связи заболевания с профессией
DOC
CV DENNY
PDF
Detaljhandelsdagen Marcus wibergh
PDF
From Strategy To Execution
NIT PORTAL SOCIAL -A SUSTENTABILIDADE NA SAÚDE COMEÇA ANTES DOS HOSPITAIS
Realidad nacional 2222
Diapositivas realidad
License plate recognition an insight to the proposed approach for plate local...
7. photos
Realidad nacional
Better Broadband for All
Leo el papi
The Idaho MSA for Employer Groups & Employees
Zur Ökonomie von Netzneutralität
Cartilha Saúde Gratuita - Instituto Canguru
Diapositivas realidad
4экспертиза связи заболевания с профессией
CV DENNY
Detaljhandelsdagen Marcus wibergh
From Strategy To Execution
Ad

Similar to SKuehn_MachineLearningAndOptimization_2015 (20)

PDF
MLHEP 2015: Introductory Lecture #4
PDF
Computer graphics 2
PPTX
Optimization tutorial
PDF
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
PDF
bv_cvxslides (1).pdf
PDF
lecture5.pdf
PDF
sheet6.pdf
PDF
doc6.pdf
PDF
paper6.pdf
PPTX
UNIT DAA PPT cover all topics 2021 regulation
PDF
02 basics i-handout
PDF
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
PPTX
Episode 50 : Simulation Problem Solution Approaches Convergence Techniques S...
PDF
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Clo...
PDF
DOCX
AD3351 DAA ANSWER KEY question and answer .docx
PDF
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
PDF
Introduction to Big Data Science
PDF
PECCS 2014
MLHEP 2015: Introductory Lecture #4
Computer graphics 2
Optimization tutorial
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
bv_cvxslides (1).pdf
lecture5.pdf
sheet6.pdf
doc6.pdf
paper6.pdf
UNIT DAA PPT cover all topics 2021 regulation
02 basics i-handout
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
Episode 50 : Simulation Problem Solution Approaches Convergence Techniques S...
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Clo...
AD3351 DAA ANSWER KEY question and answer .docx
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Introduction to Big Data Science
PECCS 2014

More from Stefan Kühn (16)

PDF
data2day2023_SKuehn_DataPlatformFallacy.pdf
PDF
data2day2022_SKuehn_DataValueChain.pdf
PDF
Talk at MCubed London about Manifold Learning and Applications
PDF
Data Science - Cargo Cult - Organizational Change
PDF
Interactive Dashboards with R
PDF
Talk at PyData Berlin about Manifold Learning and Applications
PDF
Bridging the gap
PDF
The Machinery behind Deep Learning
PDF
Manifold Learning and Data Visualization
PDF
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
PDF
Learning To Rank data2day 2017
PDF
Deep Learning and Optimization Methods
PDF
Visualizing and Communicating High-dimensional Data
PDF
Data quality - The True Big Data Challenge
PDF
Data Visualization at codetalks 2016
PDF
SKuehn_Talk_FootballAnalytics_data2day2015
data2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2022_SKuehn_DataValueChain.pdf
Talk at MCubed London about Manifold Learning and Applications
Data Science - Cargo Cult - Organizational Change
Interactive Dashboards with R
Talk at PyData Berlin about Manifold Learning and Applications
Bridging the gap
The Machinery behind Deep Learning
Manifold Learning and Data Visualization
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Learning To Rank data2day 2017
Deep Learning and Optimization Methods
Visualizing and Communicating High-dimensional Data
Data quality - The True Big Data Challenge
Data Visualization at codetalks 2016
SKuehn_Talk_FootballAnalytics_data2day2015

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
annual-report-2024-2025 original latest.
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Foundation of Data Science unit number two notes
climate analysis of Dhaka ,Banglades.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Business Analytics and business intelligence.pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Qualitative Qantitative and Mixed Methods.pptx
Clinical guidelines as a resource for EBP(1).pdf
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Acceptance and paychological effects of mandatory extra coach I classes.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Miokarditis (Inflamasi pada Otot Jantung)
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Database Infoormation System (DBIS).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
annual-report-2024-2025 original latest.
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Foundation of Data Science unit number two notes

SKuehn_MachineLearningAndOptimization_2015

  • 1. The Foundations of (Machine) Learning Stefan Kühn codecentric AG CSMLS Meetup Hamburg - February 26th, 2015 Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 1 / 18
  • 2. Contents 1 Supervised Machine Learning 2 Optimization Theory 3 Concrete Optimization Methods Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 2 / 18
  • 3. 1 Supervised Machine Learning 2 Optimization Theory 3 Concrete Optimization Methods Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 3 / 18
  • 4. Setting Supervised Learning Approach Use labeled training data in order to fit a given model to the data, i.e. to learn from the given data. Typical Problems: Classification - discrete output Logistic Regression Neural Networks Support Vector Machines Regression - continuous output Linear Regression Support Vector Regression Generalized Linear/Additive Models Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 4 / 18
  • 5. Training and Learning Ingredients: Training Data Set Model, e.g. Logistic Regression Error Measure, e.g. Mean Squared Error Learning Procedure: Derive objective function from Model and Error Measure Initialize Model parameters Find a good fit! Iterate with other initial parameters What is Learning in this context? Learning is nothing but the application of an algorithm for unconstrained optimization to the given objective function. Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 5 / 18
  • 6. 1 Supervised Machine Learning 2 Optimization Theory 3 Concrete Optimization Methods Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 6 / 18
  • 7. Unconstrained Optimization Higher-order methods Newton’s method (fast local convergence) Gradient-based methods Gradient Descent / Steepest Descent (globally convergent) Conjugate Gradient (globally convergent) Gauß-Newton, Levenberg-Marquardt, Quasi-Newton Krylow Subspace methods Derivative-free methods, direct search Secant method (locally convergent) Regula Falsi and successors (global convergence, typically slow) Nelder-Mead / Downhill-Simplex unconventional method, creates a moving simplex driven by reflection/contraction/expansion of the corner points globally convergent for differentiable functions f ∈ C1 Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 7 / 18
  • 8. General Iterative Algorithmic Scheme Goal: Minimize a given function f : min f (x), x ∈ Rn Iterative Algorithms Starting from a given point an iterative algorithm tries to minimize the objective function step by step. Preparation: k = 0 Initialization: Choose initial points and parameters Iterate until convergence: k = 1, 2, 3, . . . Termination criterion: Check optimality of the current iterate Descent Direction: Find reasonable search direction Stepsize: Determine length of the step in the given direction Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 8 / 18
  • 9. Termination criteria Critical points x∗ : f (x∗ ) = 0 Gradient: Should converge to zero f (x∗ ) < tol Iterates: Distance between xk and xk+1 should converge to zero xk − xk+1 < tol Function Values: Difference between f (xk) and f (xk+1) should converge to zero f (xk ) − f (xk+1 ) < tol Number of iterations: Terminate after maxiter iterations Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 9 / 18
  • 10. Descent direction Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 10 / 18
  • 11. Descent direction Geometric interpretation d is a descent direction if and only if the angle α between the gradient f (x) and d is in a certain range: π 2 = 90◦ < α < 270◦ = 3π 2 Algebraic equivalent The sign of the scalar product between two vectors a and b is determined by the cosine of the angle α between a and b: a, b = aT b = a b cos α(a, b) d is a descent direction if and only if: dT f (x) < 0 Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 11 / 18
  • 12. Stepsize Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 12 / 18
  • 13. Stepsize Armijo’s rule Takes two parameters 0 < σ < 1, and 0 < ρ < 0.5 For = 0, 1, 2, ... test Armijo condition: f (p + σ d) < f (p) + ρσ dT f (p) Accepted stepsize First that passes this test determines the accepted stepsize t = σ Standard Armijo implies, that for the accepted stepsize t always holds t <= 1, only semi-efficient. Technical detail: Widening Test whether some t > 0 satisfy Armijo condition, i.e. check = −1, −2, . . . as well, ensures efficiency. Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 13 / 18
  • 14. 1 Supervised Machine Learning 2 Optimization Theory 3 Concrete Optimization Methods Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 14 / 18
  • 15. Gradient Descent Descent direction Direction of Steepest Descent, the negative gradient: d = − f (x) Motivation: corresponds to α = 180◦ = π obvious choice, always a descent direction, no test needed guarantees the quickest win locally works with inexact line search, e.g. Armijo’ s rule works for functions f ∈ C1 always solves auxiliary optimization problem min sT f (x), s ∈ Rn , s = 1 Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 15 / 18
  • 16. Conjugate Gradient Motivation: Quadratic Model Problem, minimize f (x) = Ax − b 2 Optimality condition: f (x∗ ) = 2AT (Ax∗ − b) = 0 Obvious approach: Solve system of linear equations AT Ax = AT b Descent direction Consecutive directions di , . . . , di+k satisfy certain orthogonality or conjugacy conditions, M = AT A symmetric positive definite: dT i Mdj = 0, i = j Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 16 / 18
  • 17. Nonlinear Conjugate Gradient Initial Steps: start at point x0 with d0 = − f (x0) perform exact line search, find t0 = arg min f (x0 + td0), t > 0 set x1 = x0 + t0d0. Iteration: set ∆k = − f (xk) compute βk via one of the available formulas (next slide) update conjugate search direction dk = ∆k + βkdk−1 perform exact line search, find tk = arg min f (xk + tdk), t > 0 set xk+1 = xk + tkdk Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 17 / 18
  • 18. Nonlinear Conjugate Gradient Formulas for βk: Fletcher-Reeves βFR k = ∆T k ∆k ∆T k−1∆k−1 Polak-Ribière βPR k = ∆T k (∆k − ∆k−1) ∆T k−1∆k−1 Hestenes-Stiefel βHS k = − ∆T k (∆k − ∆k−1) sT k−1 (∆k − ∆k−1) Dai-Yuan βDY k = − ∆T k ∆k sT k−1 (∆k − ∆k−1) Reasonable choice with automatic direction reset: β = max 0, βPR Stefan Kühn (codecentric AG) Unconstrained Optimization 25.02.2015 18 / 18