SlideShare a Scribd company logo
Artificial Neural Network Training Algorithms
DR.MRINMOY MAJUMDER
QUASI-NEWTON
LEVENBERG-MARQUARDT
QUASI-NEWTON METHOD
• Newton's method is computationally expensive, as it requires
many operations to evaluate the Hessian matrix and compute
its inverse.
• Alternative approaches, known as quasi-Newton or variable
metric methods, are developed to solve that drawback.
• These methods, instead of calculating the Hessian directly and then
evaluating its inverse, build up an approximation to the inverse
Hessian at each iteration of the algorithm.
• The main idea behind the quasi-Newton method is to approximate
the inverse Hessian by another matrix using only the first
partial derivatives of the loss function.
Equation for Weight Update/Parameter Improvement
Approximation of Inverse Hessian Matrix
Training Rate
How ?
1 1st iteration :
random values
assigned
2 2nd : random
value assigned.
3 Use the weight
update formula
Process
1st iteration
Weight assigned
randomly
Process
2nd iteration
Weight Assigned
Randomly
G
Inverse Hessian
Matrix
Process
Weight Update
formula used
3rd iteration
η
Learning rate
Either constant or
approximated by
line minimization
g
1st order loss
gradient
Approximation of Inverse Hessian Matrix
where S(n) = g
Levenberg-Marquardt algorithm
The Levenberg-Marquardt algorithm or the damped least-
squares method,
• Has been designed to work specifically with loss functions
which take the form of a sum of squared errors.
• It works without computing the exact Hessian matrix.
• Instead, it works with the Gradient vector and the Jacobian
matrix.
Procedural Diagram
Weight Update Procedure
Loss function
Gradient vector of the loss function
Approximation of the Hessian matrix
where m is the number of iterations in the data set and n is the number of weights. e is the vector of all error terms
Damping parameter
λ is a damping factor that ensures the positiveness of the Hessian
When the damping parameter λ is zero, this is just Newton's method,
using the approximate Hessian matrix.
On the other hand, when λ is large, this becomes gradient descent with
a small training rate
Strength and Weakness
As we have seen the Levenberg-Marquardt algorithm is a method tailored for functions of the type sum-of-
squared-error.
That makes it to be very fast when training neural networks measured on that kind of errors.
Drawbacks
The first one is that it cannot be applied to functions such as the root mean squared error or the cross
entropy error.
Also, it is not compatible with regularization terms.
Finally, for very big data sets and neural networks, the Jacobian matrix becomes huge, and therefore it
requires a lot of memory.
Therefore, the Levenberg-Marquardt algorithm is not recommended when we have big data sets
and/or neural networks.
Comparisons
If our neural networks has many
thousands of parameters/weights, we
can use gradient descent or conjugate
gradient, to save memory. If we have
multiple neural networks to train with
just a few thousands of instances or
data sets and a few hundreds of
weights, the best choice might be the
Levenberg-Marquardt algorithm. In
the rest of situations, the quasi-
Newton method will work well.

More Related Content

PPTX
effect of learning rate
PPTX
Advanced topics in artificial neural networks
PPTX
Lecture 18: Gaussian Mixture Models and Expectation Maximization
PPTX
PPTX
Em Algorithm | Statistics
PPTX
Efficient Backpropagation
PPT
Resistance Applied
PPT
Part 2: Unsupervised Learning Machine Learning Techniques
effect of learning rate
Advanced topics in artificial neural networks
Lecture 18: Gaussian Mixture Models and Expectation Maximization
Em Algorithm | Statistics
Efficient Backpropagation
Resistance Applied
Part 2: Unsupervised Learning Machine Learning Techniques

Similar to Quasi newton artificial neural network training algorithms (20)

PDF
10.1109@TNNLS.2020.3015200.pdf
PPTX
Training algorithms for Neural Networks
PDF
Artificial Neural Networks Deep Learning Report
PDF
International Refereed Journal of Engineering and Science (IRJES)
PDF
24csit38.pdf
PDF
PPTChapter12.pdf
PPTX
Feedforward
PDF
Machine Learning: The Bare Math Behind Libraries
PDF
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
PPT
Lec 3-4-5-learning
PDF
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
PDF
勾配法
PDF
Understanding Blackbox Prediction via Influence Functions
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
PDF
1-s2.0-S092523121401087X-main
PDF
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
PDF
The Machinery behind Deep Learning
PDF
Survey on Artificial Neural Network Learning Technique Algorithms
PPTX
PPT
ML_Overview.ppt
10.1109@TNNLS.2020.3015200.pdf
Training algorithms for Neural Networks
Artificial Neural Networks Deep Learning Report
International Refereed Journal of Engineering and Science (IRJES)
24csit38.pdf
PPTChapter12.pdf
Feedforward
Machine Learning: The Bare Math Behind Libraries
BACKPROPAGATION LEARNING ALGORITHM BASED ON LEVENBERG MARQUARDT ALGORITHM
Lec 3-4-5-learning
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
勾配法
Understanding Blackbox Prediction via Influence Functions
Artificial Neural Networks Lect3: Neural Network Learning rules
1-s2.0-S092523121401087X-main
Levenberg marquardt-algorithm-for-karachi-stock-exchange-share-rates-forecast...
The Machinery behind Deep Learning
Survey on Artificial Neural Network Learning Technique Algorithms
ML_Overview.ppt
Ad

More from Mrinmoy Majumder (20)

PPTX
The Concept of Attenuation and Lag in FLood Routing
PPTX
Lecture on Grey Wolf Optimization Techniques
PPTX
Identification of the Most Significant Feature Which Causes Attenuation and L...
PPTX
Introduction to Ant Colony Optimization Techniques
PPTX
Ten Ideas to open startups in smart agriculture.pptx
PPTX
When was the first bottled drinking water sold.pptx
PPTX
Fluid Mechanics : Five Factos from History
PPTX
Vulnerability Analysis of Wetlands under Changed Climate Scenarios with the h...
PPTX
10 Most Recent Special Issues Calls for Papers
PPTX
Ten Ideas to open startups in smart agriculture
PPTX
Explore the latest advancements in hydro and energy informatics with seven ne...
PPTX
An Introduction to Water Cycle Algorithm
PPTX
What is the difference between Free and Paid Subscriber of HydroGeek Newslett...
PPTX
Ten Most Recognizable Case Studies of Using Outlier.pptx
PPTX
Five Ideas for opening startups in Virtual and Green Water
PDF
Water and Energy in style
PDF
What is next in AI ML Modeling of Water Resource Development.pdf
PDF
Very Short Term Course on MAUT in Water Resource Management.pdf
PDF
Most Recommended news,products and publications from hydroinformatics
PPTX
Latest Jobs, Scholarship Opportunities and CFPs in.pptx
The Concept of Attenuation and Lag in FLood Routing
Lecture on Grey Wolf Optimization Techniques
Identification of the Most Significant Feature Which Causes Attenuation and L...
Introduction to Ant Colony Optimization Techniques
Ten Ideas to open startups in smart agriculture.pptx
When was the first bottled drinking water sold.pptx
Fluid Mechanics : Five Factos from History
Vulnerability Analysis of Wetlands under Changed Climate Scenarios with the h...
10 Most Recent Special Issues Calls for Papers
Ten Ideas to open startups in smart agriculture
Explore the latest advancements in hydro and energy informatics with seven ne...
An Introduction to Water Cycle Algorithm
What is the difference between Free and Paid Subscriber of HydroGeek Newslett...
Ten Most Recognizable Case Studies of Using Outlier.pptx
Five Ideas for opening startups in Virtual and Green Water
Water and Energy in style
What is next in AI ML Modeling of Water Resource Development.pdf
Very Short Term Course on MAUT in Water Resource Management.pdf
Most Recommended news,products and publications from hydroinformatics
Latest Jobs, Scholarship Opportunities and CFPs in.pptx
Ad

Recently uploaded (20)

PPTX
OOP with Java - Java Introduction (Basics)
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Sustainable Sites - Green Building Construction
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Digital Logic Computer Design lecture notes
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Current and future trends in Computer Vision.pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Artificial Intelligence
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
Mechanical Engineering MATERIALS Selection
PPTX
additive manufacturing of ss316l using mig welding
OOP with Java - Java Introduction (Basics)
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Safety Seminar civil to be ensured for safe working.
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Sustainable Sites - Green Building Construction
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
UNIT 4 Total Quality Management .pptx
Digital Logic Computer Design lecture notes
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Current and future trends in Computer Vision.pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Artificial Intelligence
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Mechanical Engineering MATERIALS Selection
additive manufacturing of ss316l using mig welding

Quasi newton artificial neural network training algorithms

  • 1. Artificial Neural Network Training Algorithms DR.MRINMOY MAJUMDER QUASI-NEWTON LEVENBERG-MARQUARDT
  • 2. QUASI-NEWTON METHOD • Newton's method is computationally expensive, as it requires many operations to evaluate the Hessian matrix and compute its inverse. • Alternative approaches, known as quasi-Newton or variable metric methods, are developed to solve that drawback. • These methods, instead of calculating the Hessian directly and then evaluating its inverse, build up an approximation to the inverse Hessian at each iteration of the algorithm. • The main idea behind the quasi-Newton method is to approximate the inverse Hessian by another matrix using only the first partial derivatives of the loss function.
  • 3. Equation for Weight Update/Parameter Improvement Approximation of Inverse Hessian Matrix Training Rate
  • 4. How ? 1 1st iteration : random values assigned 2 2nd : random value assigned. 3 Use the weight update formula Process 1st iteration Weight assigned randomly Process 2nd iteration Weight Assigned Randomly G Inverse Hessian Matrix Process Weight Update formula used 3rd iteration η Learning rate Either constant or approximated by line minimization g 1st order loss gradient
  • 5. Approximation of Inverse Hessian Matrix where S(n) = g
  • 6. Levenberg-Marquardt algorithm The Levenberg-Marquardt algorithm or the damped least- squares method, • Has been designed to work specifically with loss functions which take the form of a sum of squared errors. • It works without computing the exact Hessian matrix. • Instead, it works with the Gradient vector and the Jacobian matrix.
  • 8. Weight Update Procedure Loss function Gradient vector of the loss function Approximation of the Hessian matrix where m is the number of iterations in the data set and n is the number of weights. e is the vector of all error terms
  • 9. Damping parameter λ is a damping factor that ensures the positiveness of the Hessian When the damping parameter λ is zero, this is just Newton's method, using the approximate Hessian matrix. On the other hand, when λ is large, this becomes gradient descent with a small training rate
  • 10. Strength and Weakness As we have seen the Levenberg-Marquardt algorithm is a method tailored for functions of the type sum-of- squared-error. That makes it to be very fast when training neural networks measured on that kind of errors. Drawbacks The first one is that it cannot be applied to functions such as the root mean squared error or the cross entropy error. Also, it is not compatible with regularization terms. Finally, for very big data sets and neural networks, the Jacobian matrix becomes huge, and therefore it requires a lot of memory. Therefore, the Levenberg-Marquardt algorithm is not recommended when we have big data sets and/or neural networks.
  • 11. Comparisons If our neural networks has many thousands of parameters/weights, we can use gradient descent or conjugate gradient, to save memory. If we have multiple neural networks to train with just a few thousands of instances or data sets and a few hundreds of weights, the best choice might be the Levenberg-Marquardt algorithm. In the rest of situations, the quasi- Newton method will work well.