SlideShare a Scribd company logo
2
Most read
6
Most read
8
Most read
Gradient Descent
Dr. M. Ramesh
Prof. & HOD CSE - Cyber Security
The Idea Behind Gradient Descent
● Purpose: Gradient Descent is used to minimize a function, typically the loss or
cost function in machine learning models. The goal is to find the optimal
parameters (e.g., weights in a neural network) that minimize the loss.
● Key Insight: The direction of steepest descent (negative gradient) tells us how to
update the parameters to reduce the loss. By iteratively adjusting the parameters
in small steps, we eventually reach the minimum of the loss function.
● Example: Consider a simple linear regression problem with a loss function that
measures the difference between predicted values and actual values (Mean
Squared Error). Gradient descent helps adjust the line's slope and intercept until
this loss is minimized.
What is the Gradient ?
The gradient is a vector that points in the direction of the steepest ascent of the
loss function. The negative of this vector points in the direction of the steepest
descent.
Estimating the Gradient:
● In simple cases (e.g., linear regression), the gradient can be computed
analytically.
● In more complex scenarios (e.g., neural networks), backpropagation is used to
calculate gradients.
● In cases with large datasets, mini-batches or stochastic techniques are used to
estimate the gradient over small subsets of data.
Using the Gradient
Choosing the Right Step Size (Learning Rate)
Importance of Step Size:
● Too Large: If the step size (learning rate) is too large, the algorithm might
overshoot the minimum, causing divergence (i.e., the loss increases).
● Too Small: If the step size is too small, convergence will be slow,
requiring many iterations to reach the minimum.
● Optimal Step Size: Selecting the right learning rate is crucial for efficient
training. Methods such as learning rate schedules (reducing the learning
rate over time) or adaptive learning rates (e.g., Adam, RMSprop) can
help.
Heuristics:
● Use cross-validation to experiment with different learning
rates.
● Start with a higher learning rate and gradually reduce it
(learning rate annealing).
Using Gradient Descent to Fit Models
Example 1: Linear Regression
● Loss Function: Mean Squared Error (MSE).
● Gradient Descent: Adjusts the slope and intercept of the regression line until the error between
predicted and actual values is minimized.
Example 2: Logistic Regression
● Loss Function: Binary Cross-Entropy.
● Gradient Descent: Finds the optimal decision boundary by adjusting weights to minimize
classification error.
Example 3: Neural Networks:
● Loss Functions: Cross-Entropy for classification, Mean Squared Error for regression tasks.
● Gradient Descent: Using backpropagation, the gradients are propagated backward through the
network to update the weights in each layer.
Mini-Batch Gradient Descent:
● Instead of using the entire dataset, mini-batch gradient descent computes the gradient
on small batches (subsets) of data.
● Advantages:
○ Computationally efficient.
○ Introduces a balance between the accuracy of Batch Gradient Descent and the
noisy updates of SGD.
○ Common batch sizes: 32, 64, 128.
● Use Case: Widely used in deep learning frameworks (e.g., TensorFlow, PyTorch) as it
optimizes memory usage and allows for faster computation on GPUs.
Stochastic Gradient Descent (SGD):
● In each iteration, SGD computes the gradient using a single randomly chosen data point.
● Advantages:
○ Very fast as it only processes one example per iteration.
○ Introduces noise in the gradient updates, which can help the algorithm escape local
minima and saddle points.
● Disadvantages:
○ Noisy updates can cause oscillations around the minimum rather than exact
convergence.
● Use Case: Suitable for large-scale problems where using the entire dataset is computationally
expensive.
Comparison of Gradient Descent Methods
Batch Gradient Descent: Uses the entire dataset for each update;
more accurate but computationally intensive.
Mini-Batch Gradient Descent: Trades off between stability and
computational efficiency by using small batches; very popular in
practice.
Stochastic Gradient Descent (SGD): Fast, noisy updates; useful for
large datasets and online learning.
Takeaways
● Gradient Descent is a versatile and widely used optimization
algorithm for training machine learning models.
● Proper tuning of the learning rate and choosing the right variant
(batch, mini-batch, or stochastic) are key to achieving efficient and
effective optimization.

More Related Content

PPTX
An overview of gradient descent optimization algorithms
PPTX
4. OPTIMIZATION NN AND FL.pptx
PPTX
Gradient Descent DS Rohit Sharma fench knjs.pptx
PPTX
Linear regression with gradient descent
PDF
Methods of Optimization in Machine Learning
PPTX
Advance Machine Learning presentation.pptx
PPTX
Gradient descent variants in deep laearning
PDF
Overview on Optimization algorithms in Deep Learning
An overview of gradient descent optimization algorithms
4. OPTIMIZATION NN AND FL.pptx
Gradient Descent DS Rohit Sharma fench knjs.pptx
Linear regression with gradient descent
Methods of Optimization in Machine Learning
Advance Machine Learning presentation.pptx
Gradient descent variants in deep laearning
Overview on Optimization algorithms in Deep Learning

Similar to Gradient Descent or Assent is to find optimal parameters that minimize the loss. (20)

PDF
3.1. Linear Regression and Gradient Desent.pdf
PDF
An overview of gradient descent optimization algorithms.pdf
PDF
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
PDF
Dep Neural Networks introduction new.pdf
PPTX
Gradient descent optimizer
PPTX
Deep Neural Network Module 3A Optimization.pptx
PPTX
Introduction to deep Learning Fundamentals
PPTX
Stochastic Gradient Decent (SGD).pptx
PPTX
Machine learning Module-2, 6th Semester Elective
PPTX
Supervised learning for IOT IN Vellore Institute of Technology
PDF
Lesson 5_VARIOUS_ optimization_algos.pdf
PPTX
Techniques in Deep Learning
PPTX
DeepLearningLecture.pptx
PDF
Deep learning concepts
PDF
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
PPTX
SC_U2_PPT.pptxE13RWT4SDGHYFYJGVKBHJL.J,HGRFEDWSEZRFXDGTCHVYJBKNLM.LK,JKHM
PPTX
Introduction to Deep learning and H2O for beginner's
PPTX
Optimization techniq
PDF
Stochastic gradient descent and its tuning
PPTX
Optimization of mathematical function using gradient descent algorithm.pptx
3.1. Linear Regression and Gradient Desent.pdf
An overview of gradient descent optimization algorithms.pdf
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
Dep Neural Networks introduction new.pdf
Gradient descent optimizer
Deep Neural Network Module 3A Optimization.pptx
Introduction to deep Learning Fundamentals
Stochastic Gradient Decent (SGD).pptx
Machine learning Module-2, 6th Semester Elective
Supervised learning for IOT IN Vellore Institute of Technology
Lesson 5_VARIOUS_ optimization_algos.pdf
Techniques in Deep Learning
DeepLearningLecture.pptx
Deep learning concepts
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
SC_U2_PPT.pptxE13RWT4SDGHYFYJGVKBHJL.J,HGRFEDWSEZRFXDGTCHVYJBKNLM.LK,JKHM
Introduction to Deep learning and H2O for beginner's
Optimization techniq
Stochastic gradient descent and its tuning
Optimization of mathematical function using gradient descent algorithm.pptx
Ad

Recently uploaded (20)

PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Sustainable Sites - Green Building Construction
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Construction Project Organization Group 2.pptx
PDF
composite construction of structures.pdf
PPTX
Welding lecture in detail for understanding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Digital Logic Computer Design lecture notes
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Sustainable Sites - Green Building Construction
Model Code of Practice - Construction Work - 21102022 .pdf
Operating System & Kernel Study Guide-1 - converted.pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
additive manufacturing of ss316l using mig welding
Construction Project Organization Group 2.pptx
composite construction of structures.pdf
Welding lecture in detail for understanding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
bas. eng. economics group 4 presentation 1.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
UNIT 4 Total Quality Management .pptx
Digital Logic Computer Design lecture notes
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Automation-in-Manufacturing-Chapter-Introduction.pdf
Ad

Gradient Descent or Assent is to find optimal parameters that minimize the loss.

  • 1. Gradient Descent Dr. M. Ramesh Prof. & HOD CSE - Cyber Security
  • 2. The Idea Behind Gradient Descent ● Purpose: Gradient Descent is used to minimize a function, typically the loss or cost function in machine learning models. The goal is to find the optimal parameters (e.g., weights in a neural network) that minimize the loss. ● Key Insight: The direction of steepest descent (negative gradient) tells us how to update the parameters to reduce the loss. By iteratively adjusting the parameters in small steps, we eventually reach the minimum of the loss function. ● Example: Consider a simple linear regression problem with a loss function that measures the difference between predicted values and actual values (Mean Squared Error). Gradient descent helps adjust the line's slope and intercept until this loss is minimized.
  • 3. What is the Gradient ? The gradient is a vector that points in the direction of the steepest ascent of the loss function. The negative of this vector points in the direction of the steepest descent.
  • 4. Estimating the Gradient: ● In simple cases (e.g., linear regression), the gradient can be computed analytically. ● In more complex scenarios (e.g., neural networks), backpropagation is used to calculate gradients. ● In cases with large datasets, mini-batches or stochastic techniques are used to estimate the gradient over small subsets of data.
  • 6. Choosing the Right Step Size (Learning Rate) Importance of Step Size: ● Too Large: If the step size (learning rate) is too large, the algorithm might overshoot the minimum, causing divergence (i.e., the loss increases). ● Too Small: If the step size is too small, convergence will be slow, requiring many iterations to reach the minimum. ● Optimal Step Size: Selecting the right learning rate is crucial for efficient training. Methods such as learning rate schedules (reducing the learning rate over time) or adaptive learning rates (e.g., Adam, RMSprop) can help.
  • 7. Heuristics: ● Use cross-validation to experiment with different learning rates. ● Start with a higher learning rate and gradually reduce it (learning rate annealing).
  • 8. Using Gradient Descent to Fit Models Example 1: Linear Regression ● Loss Function: Mean Squared Error (MSE). ● Gradient Descent: Adjusts the slope and intercept of the regression line until the error between predicted and actual values is minimized. Example 2: Logistic Regression ● Loss Function: Binary Cross-Entropy. ● Gradient Descent: Finds the optimal decision boundary by adjusting weights to minimize classification error. Example 3: Neural Networks: ● Loss Functions: Cross-Entropy for classification, Mean Squared Error for regression tasks. ● Gradient Descent: Using backpropagation, the gradients are propagated backward through the network to update the weights in each layer.
  • 9. Mini-Batch Gradient Descent: ● Instead of using the entire dataset, mini-batch gradient descent computes the gradient on small batches (subsets) of data. ● Advantages: ○ Computationally efficient. ○ Introduces a balance between the accuracy of Batch Gradient Descent and the noisy updates of SGD. ○ Common batch sizes: 32, 64, 128. ● Use Case: Widely used in deep learning frameworks (e.g., TensorFlow, PyTorch) as it optimizes memory usage and allows for faster computation on GPUs.
  • 10. Stochastic Gradient Descent (SGD): ● In each iteration, SGD computes the gradient using a single randomly chosen data point. ● Advantages: ○ Very fast as it only processes one example per iteration. ○ Introduces noise in the gradient updates, which can help the algorithm escape local minima and saddle points. ● Disadvantages: ○ Noisy updates can cause oscillations around the minimum rather than exact convergence. ● Use Case: Suitable for large-scale problems where using the entire dataset is computationally expensive.
  • 11. Comparison of Gradient Descent Methods Batch Gradient Descent: Uses the entire dataset for each update; more accurate but computationally intensive. Mini-Batch Gradient Descent: Trades off between stability and computational efficiency by using small batches; very popular in practice. Stochastic Gradient Descent (SGD): Fast, noisy updates; useful for large datasets and online learning.
  • 12. Takeaways ● Gradient Descent is a versatile and widely used optimization algorithm for training machine learning models. ● Proper tuning of the learning rate and choosing the right variant (batch, mini-batch, or stochastic) are key to achieving efficient and effective optimization.